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Abstract 

This paper describes an architecture that combines the complementary strengths of probabilistic graphical models 
and declarative programming to enable robots to represent and reason with logic-based and probabilistic descriptions 
of uncertainty and domain knowledge. An action language is extended to support non-boolean fluents and non- 
deterministic causal laws. This action language is used to describe tightly-coupled transition diagrams at two levels 
of granularity, refining a coarse-resolution transition diagram of the domain to obtain a fine-resolution transition 
diagram. The coarse-resolution system description, and a history that includes (prioritized) defaults, are translated into 
an Answer Set Prolog (ASP) program. For any given goal, inference in the ASP program provides a plan of abstract 
actions. To implement each such abstract action probabilistically, the part of the fine-resolution transition diagram 
relevant to this action is identified, and a probabilistic representation of the uncertainty in sensing and actuation is 
included and used to construct a partially observable Markov decision process (POMDP). The policy obtained by 
solving the POMDP is invoked repeatedly to implement the abstract action as a sequence of concrete actions, with 
the corresponding observations being recorded in the coarse-resolution history and used for subsequent reasoning. 
The architecture is evaluated in simulation and on a mobile robot moving objects in an indoor domain, to show that it 
supports reasoning with violation of defaults, noisy observations and unreliable actions, in complex domains. 


1 Introduction 

Robotfjare increasingly being used to assist humans in homes, offices and other complex domains. To truly assist 
humans in such domains, robots need to be re-taskable and robust. We consider a robot to be re-taskable if its reasoning 
system enables it to achieve a wide range of goals in a wide range of environments. We consider a robot to be robust 
if it is able to cope with unreliable sensing, unreliable actions, changes in the environment agents, and the existence of 
atypical environments, by representing and reasoning with different description of knowledge and uncertainty. While 
there have been many attempts, satisfying these desiderata remains an open research problem. 

Robotics and artificial intelligence researchers have developed many approaches for robot reasoning, drawing on 
ideas from two very different classes of systems for knowledge representation and reasoning, based on logic and prob¬ 
ability theory respectively. Systems based on logic incorporate compositionally structured commonsense knowledge 
about objects and relations, and support powerful generalization of reasoning to new situations. Systems based on 
probability reason optimally (or near optimally) about the effects of numerically quantifiable uncertainty in sensing 

1 We use the terms “robot” and “agent” interchangeably in this paper. 
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and action. There have been many attempts to combine the benefits of these two classes of systems, including work on 
joint (i.e., logic-based and probabilistic) representations of state and action, and algorithms for planning and decision¬ 
making in such formalisms. These approaches provide significant expressive power, but they also impose a significant 
computational burden. More efficient (and often approximate) reasoning algorithms for such unified probabilistic- 
logical paradigms are being developed. However, practical robot systems that combine abstract task-level planning 
with probabilistic reasoning, link, rather than unify, their logic-based and probabilistic representations, primarily be¬ 
cause roboticists often need to trade expressivity or correctness guarantees for computational speed. Information close 
to the sensorimotor level is often represented probabilistically to quantitatively model the uncertainty in sensing and 
actuation, with the robot’s beliefs including statements such as “the robotics book is on the shelf with probability 
0.9”. At the same time, logic-based systems are used to reason with (more) abstract commonsense knowledge, which 
may not necessarily be natural or easy to represent probabilistically. This knowledge may include hierarchically or¬ 
ganized information about object sorts (e.g., a cookbook is a book), and default information that holds in all but a 
few exceptional situations (e.g., “books are typically found in the library”). These representations are linked, in that 
the probabilistic reasoning system will periodically commit particular claims about the world being true, with some 
residual uncertainty, to the logical reasoning system, which then reasons about those claims as if they were true. There 
are thus languages of different expressive strengths, which are linked within an architecture. 

The existing work in architectures for robot reasoning has some key limitations. First, many of these systems are 
driven by the demands of robot systems engineering, and there is little formalization of the corresponding architectures. 
Second, many systems employ a logical language that is indefeasible, e.g., first order predicate logic, and incorrect 
commitments can lead to irrecoverable failures. Our proposed architecture addresses these limitations. It represents 
and reasons about the world, and the robot’s knowledge of it, at two granularities. A fine-resolution description of 
the domain, close to the data obtained from the robot’s sensors and actuators, is reasoned about probabilistically, 
while a coarse-resolution description of the domain, including commonsense knowledge, is reasoned about using non¬ 
monotonic logic. Our architecture precisely defines the coupling between the representations at the two granularities, 
enabling the robot to represent and efficiently reason about commonsense knowledge, what the robot does not know, 
and how actions change the robot’s knowledge. The interplay between the two types of knowledge is viewed as a 
conversation between, and the (physical and mental) actions of, a logician and a statistician. Consider, for instance, 
the following exchange: 

Logician: the goal is to find the robotics book. I do not know where it is, but I know that books are typically in the 
library and lam in the library. We should first look for the robotics book in the library. 

Logician —> Statistician: look for the robotics book in the library. You only need to reason about the robotics book 
and the library. 

Statistician: In my representation of the world, the library is a set of grid cells. I shall determine how to locate the 
book probabilistically in these cells considering the probabilities of movement failures and visual processing 
failures. 

Statistician: I visually searched for the robotics book in the grid cells of the library, but did not find the book. Although 
there is a small probability that I missed the book, I am prepared to commit that the robotics book is not in the 
library. 

Statistician —> Logician: here are my observations from searching the library; the robotics book is not in the library. 

Logician: the robotics book was not found in the library either because it was not there, or because it was moved to 
another location. The next default location for books is the bookshelf in the lab. We should go look there next. 

and so on... 

where the representations used by the logician and the statistician, and the communication of information between 
them, is coordinated by a controller. This imaginary exchange illustrates key features of our approach: 

• Reasoning about the states of the domain, and the effects of actions, happens at different levels of granularity, 
e.g., the logician reasons about rooms, whereas the statistician reasons about grid cells in those rooms. 

• For any given goal, the logician computes a plan of abstract actions, and each abstract action is executed proba¬ 
bilistically as a sequence of concrete actions planned by the statistician. 
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• The effects of the coarse-resolution (logician’s) actions are non-deterministic, but the statistician’s fine-resolution 
action effects, and thus the corresponding beliefs, have probabilities associated with them. 

• The coarse-resolution knowledge base (of the logician) may include knowledge of things that are irrelevant to 
the current goal. Probabilistic reasoning at fine resolution (by statistician) only considers things deemed relevant 
to the current coarse-resolution action. 

• Fine-resolution probabilistic reasoning about observations and actions updates probabilistic beliefs, and highly 
likely statements (e.g., probability > 0.9) are considered as being completely certain for subsequent coarse- 
resolution reasoning (by the logician). 

1.1 Technical Contributions 

The design of our architecture is based on tightly-coupled transition diagrams at two levels of granularity. A coarse- 
resolution description includes commonsense knowledge, and the fine-resolution transition diagram is defined as a 
refinement of the coarse-resolution transition diagram. For any given goal, non-monotonic logical reasoning with the 
coarse-resolution system description and the system’s recorded history, results in a sequence of abstract actions. Each 
such abstract action is implemented as a sequence of concrete actions by zooming to a part of the fine-resolution tran¬ 
sition diagram relevant to this abstract action, and probabilistically modeling the non-determinism in action outcomes. 
The technical contributions of this architecture are summarized below. 

Action language extensions. An action language is a formalism used to model action effects, and many action 
languages have been developed and used in robotics, e.g., STRIPS, PDDL lU9l . BC ll32l . and AL,/ |H71 . We extend 
AL c i in two ways to make it more expressive. First, we allow fluents (domain properties that can change) that are non- 
Boolean, which allows us to compactly model a much wider range of situations. Second, we allow non-deterministic 
causal laws, which captures the non-deterministic effects of the robot’s actions, not only in probabilistic but also 
qualitative terms. This extended version of ALj is used to describe the coarse-resolution and fine-resolution transition 
diagrams of the proposed architecture. 

Defaults, histories and explanations. Our architecture makes three contributions related to reasoning with default 
knowledge and histories. First, we expand the notion of the history of a dynamic domain, which typically includes 
a record of actions executed and observations obtained (by the robot), to support the representation of (prioritized) 
default information. We can, for instance, say that a textbook is typically found in the library and, if it is not there, it is 
typically found in the auxiliary library. Second, we define the notion of a model of a history with defaults in the initial 
state, enabling the robot to reason with such defaults. Third, we limit reasoning with such expanded histories to the 
coarse resolution, and enable the robot to efficiently (a) use default knowledge to compute plans to achieve the desired 
goal; and (b) reason with history to generate explanations for unexpected observations. For instance, in the absence 
of knowledge about the locations of a specific object, the robot can construct a plan using the object’s default location 
to speed up search. Also, the robot can build a revised model of the history to explain subsequent observations that 
contradict expectations based on initial assumptions. 

Tightly-coupled transition diagrams. The next set of contributions are related to the relationship between different 
models of the domain used by the robot, i.e., the tight coupling between the transition diagrams at two resolutions. 
First, we provide a formal definition of one transition diagram being a refinement of another, and use this definition 
to formalize the notion of the coarse-resolution transition diagram being refined to obtain the fine-resolution transi¬ 
tion diagram—the fact that both transition diagrams are described in the same language facilitates their construction 
and this formalization. A coarse-resolution state is, for instance, magnified to provide multiple states at the fine- 
resolution—the corresponding ability to reason about space at two different resolutions is central for scaling to larger 
environments. We find two resolutions to be practically sufficient for many robot tasks, and leave extensions to other 
resolutions as an open problem. Second, we define randomization of a fine-resolution transition diagram, replacing 
deterministic causal laws by non-deterministic ones. Third, we formally define and automate zooming to a part of the 
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fine-resolution transition diagram relevant to a specific coarse-resolution transition, allowing the robot, while execut¬ 
ing any given abstract action, to avoid considering parts of the hne-resolution diagram irrelevant to this action, e.g., a 
robot moving between two rooms only considers its location in the cells in those rooms. 

Dynamic generation of probabilistic representations. The next set of innovations connect the contributions de¬ 
scribed so far to quantitative models of action and observation uncertainty. First, we use a semi-supervised algorithm, 
the randomized fine-resolution transition diagram, prior knowledge (if any), and experimental trials, to collect statis¬ 
tics and compute probabilities of fine-resolution action outcomes and observations. Second, we provide an algorithm 
that, for any given abstract action, uses these computed probabilities and the zoomed fine-resolution description to 
automatically construct the data structures for, and thus significantly limit the computational requirements of, proba¬ 
bilistic reasoning. Third, based on the coupling between transition diagrams at the two resolutions, the outcomes of 
probabilistic reasoning update the coarse-resolution history for subsequent reasoning. 

Methodology and architecture. The final set of contributions are related to the overall architecture. First, for the 
design of the software components of robots that are re-taskable and robust, we articulate a methodology that is 
rather general, provides a path for proving correctness of these components, and enables us to predict the robot’s be¬ 
havior. Second, the proposed knowledge representation and reasoning architecture combines the representation and 
reasoning methods from action languages, declarative programming, probabilistic state estimation and probabilistic 
planning, to support reliable and efficient operation. The domain representation for logical reasoning is translated into 
a program in SPARC J2), an extension of CR-Prolog, and the representation for probabilistic reasoning is translated 
into a partially observable Markov decision process (POMDP) (27). CR-Prolog g) (and thus SPARC) incorporates 
consistency-restoring rules in Answer Set Prolog (ASP)—in this paper, the terms ASP, CR-Prolog and SPARC are of¬ 
ten used interchangeably—and has a close relationship with our action language, allowing us to reason efficiently with 
hierarchically organized knowledge and default knowledge, and to pose state estimation, planning, and explanation 
generation within a single framework. Also, using an efficient approximate solver to reason with POMDPs supports 
a principled and quantifiable trade-off between accuracy and computational efficiency in the presence of uncertainty, 
and provides a near-optimal solution under certain conditions ll27l[37l . Third, our architecture avoids exact, inefficient 
probabilistic reasoning over the entire fine-resolution representation, while still tightly coupling the reasoning at dif¬ 
ferent resolutions. This intentional separation of non-monotonic logical reasoning and probabilistic reasoning is at the 
heart of the representational elegance, reliability and inferential efficiency provided by our architecture. 

The proposed architecture is evaluated in simulation and on a physical robot finding and moving objects in an indoor 
domain. We show that the architecture enables a robot to reason with violation of defaults, noisy observations, and 
unreliable actions, in larger, more complex domains, e.g., with more rooms and objects, than was possible before. 


1.2 Structure of the Paper 

The remainder of the paper is organized as follows. Section [2] introduces a domain used as an illustrative example 
throughout the paper, and Section [3] discusses related work in knowledge representation and reasoning for robots. 
Section [4] presents the methodology associated with the proposed architecture, and Section [5] introduces definitions 
of basic notions used to build mathematical models of the domain. Section 5.1 describes the action language used 


to describe the architecture’s coarse-resolution and fine-resolution transition diagrams, and Section p.2| introduces 
histories with initial state defaults as an additional type of record, describes models of system histories, and reduces 
planning with the coarse-resolution domain representation to computing the answer set of the corresponding ASP 
program. Section[6]provides the logician’s domain representation base on these definitions. Next, Section[7]describes 
the (a) refinement of the coarse-resolution transition diagram to obtain the fine-resolution transition diagram; (b) 
randomization of the fine-resolution system description; (c) collection of statistics to compute the probability of action 
outcomes and observations; and (d) zooming to the part of the randomized system description relevant to the execution 
of any given abstract action. Next, Section [8] describes how a POMDP is constructed and solved to obtain a policy 
that implements the abstract action as a sequence of concrete actions. The overall control loop of the architecture is 
described in Section [9] Section 10 describes the experimental results in simulation and on a mobile robot, followed 
by conclusions in Section 11 In what follows, we refer to the functions and abstract actions of the coarse-resolution 
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(b) Peoplebot 


(c) Turtlebot 


(a) Domain map 

Figure 1: (a) Subset of the map of an entire floor of a building—specific places are labeled as shown, and used in the 
goals assigned to the robot; (b)-(c) the “Peoplebot” and “Turtlebot” robot platforms used in the experimental trials. 


transition diagram as being “high level”, using H as the subscript or superscript. Concrete functions and actions of the 
fine-resolution transition diagram are referred to as being “low level”, using L as the subscript or superscript. 

2 Illustrative Example: Office Domain 

The following domain (with some variants) will be used as an illustrative example throughout the paper. 

Example 1. [Office Domain] Consider a robot that is assigned the goal of moving specific objects to specific places 
in an office domain. This domain contains: 

• The sorts: place, thing , robot , and object, with object and robot being subsorts of thing. Sorts textbook, 
printer and kitchenware, are subsorts of the sort object. Sort names and constants are written in lower-case, 
while variable names are in uppercase. 

• Four specific places: office, mainJibrary, auxJibrary, and kitchen. We assume that these places are accessible 
from each other without the need to navigate any corridors, and that doors between these places are open. 

• An instance of the sort robot, called rob\. Also, a number of instances of subsorts of the sort ob ject. 

As an extension of this illustrative example that will be used in the experimental trials on physical robots, consider 
the robot shown in Figure [1(b)] operating in an office building whose map is shown in Figure [T(aj| Assume that the 
robot can (a) build and revise the domain map based on laser range finder data; (b) visually recognize objects of 
interest; and (c) execute actuation commands, although neither the information extracted from sensor inputs nor the 
action execution is completely reliable. Next, assume that the robot is in the study corner and is given the goal of 
fetching the robotics textbook. Since the robot knows that books are typically found in the main library, ASP-based 
reasoning provides a plan of abstract actions that require the robot to go to the main library, pick up the book and bring 
it back. For the first abstract action, i.e., for moving to the main library, the robot can focus on just the relevant part of 
the fine-resolution representation, e.g., the cells through which the robot must pass, but not the robotics book that is 
irrelevant at this stage of reasoning. It then creates and solves a POMDP for this movement sub-task, and executes a 
sequence of concrete movement actions until it believes that it has reached the main library with high probability. This 
information is used to reason at the coarse resolution, prompting the robot to execute the next abstract action to pick 
up the robotics book. Now, assume that the robot is unable to pick up the robotics book because it fails to find the book 
in the main library despite a thorough search. This observation violates what the robot expects to see based on default 
knowledge, but the robot explains this by understanding that the book was not in the main library to begin with, and 
creates a plan to go to the auxiliary library, the second most likely location for textbooks. In this case, assume that the 
robot finds the book and completes the task. The proposed architecture enables such robot behavior. 
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3 Related Work 


The objective of this paper is to enable robots to represent and reason with logic-based and probabilistic descriptions 
of domain knowledge and degrees of belief. We review some related work below. 

There are many recent examples of researchers using probabilistic graphical models such as POMDPs to formulate 
tasks such as planning, sensing, navigation, and interaction on robots fl] |20] [25j ED . These formulations, by them¬ 
selves, are not well-suited for reasoning with commonsense knowledge, e.g., default reasoning and non-monotonic 
logical reasoning. In parallel, research in classical planning and logic programming has provided many algorithms for 
knowledge representation and reasoning, which have been used on mobile robots. These algorithms typically require 
a significant amount of prior knowledge of the domain and the agent’s capabilities, and the preconditions and effects 
of the actions. Many of these algorithms are based on first-order logic, and do not support capabilities such as non¬ 
monotonic logical reasoning, default reasoning, and the ability to merge new, unreliable information with the current 
beliefs in a knowledge base. Other logic-based formalisms address some of these limitations. This includes, for in¬ 
stance, theories of reasoning about action and change, as well as Answer Set Prolog (ASP), a non-monotonic logic 
programming paradigm, which is well-suited for representing and reasoning with commonsense knowledge mini- 
An international research community has developed around ASP, with applications in cognitive robotics ltl5l and other 
non-robotics domains. For instance, ASP has been used for planning and diagnostics by one or more simulated robot 
housekeepers US, and for representation of domain knowledge learned through natural language processing by robots 
interacting with humans El- ASP-based architectures have also been used for the control of unmanned aerial vehicles 
in dynamic indoor environments @12). However, ASP does not support quantitative models of uncertainty, whereas a 
lot of information available to robots is represented probabilistically to quantitatively model the uncertainty in sensor 
input processing and actuation. 

Many approaches for reasoning about actions and change in robotics and artificial intelligence (AI) are based 
on action languages, which are formal models of parts of natural language used for describing transition diagrams. 
There are many different action languages such as STRIPS, PDDL fl9l . BC ll32l , and AL,i lfl7l , which have been 
used for different applications | T0]i29fl. In robotics applications, we often need to represent and reason with recursive 
state constraints, non-boolean fluents and non-deterministic causal laws. We expanded ,4L,/, which already supports 
recursive state constraints, to address there requirements. We also expanded the notion of histories to include initial 
state defaults. Action language BC also supports the desired capabilities but it allows causal laws specifying default 
values of fluents at arbitrary time steps, and is thus too powerful for our purposes and occasionally poses difficulties 
with representing all exceptions to such defaults when the domain is expanded. 

Robotics and AI researchers have designed algorithms and architectures based on the understanding that robots 
interacting with the environment through sensors and actuators need both logical and probabilistic reasoning capa¬ 
bilities. For instance, architectures have been developed to support hierarchical representation of knowledge and 
axioms in first-order logic, and probabilistic processing of perceptual information lf30l [31] 4H). while deterministic 
and probabilistic algorithms have been combined for task and motion planning on robots l28l . Another example is the 
behavior control of a robot that included semantic maps and commonsense knowledge in a probabilistic relational rep¬ 
resentation, and then used a continual planner to switch between decision-theoretic and classical planning procedures 
based on degrees of belief 124) . The performance of such architectures can be sensitive to the choice of threshold 
for switching between the different planning procedures, and the use of first order logic in these architectures limits 
the expressiveness and use of commonsense knowledge. More recent work has used a three-layered organization of 
knowledge (instance, default and diagnostic), with knowledge at the higher level modifying that at the lower lev¬ 
els, and a three-layered architecture (competence layer, belief layer and deliberative layer) for distributed control of 
information flow, combining first-order logic and probabilistic reasoning for open world planning ll23l . Declarative 
programming has also been combined with continuous-time planners for path planning in mobile robot teams l42l . 
More recent work has combined a probabilistic extension of ASP with POMDPs for commonsense inference and 
probabilistic planning in human-robot dialog ll54l . used a probabilistic extension of ASP to determine some model 
parameters of POMDPs |50l , used ASP-based architecture to support learning of action costs on a robot lf29l . and 
combined logic programming and relational reinforcement learning to interactively and cumulatively discover domain 
axioms l45l and affordances l46l . 

Combining logical and probabilistic reasoning is a fundamental problem in AI, and many principled algorithms 
have been developed to address this problem. For instance, a Markov logic network combines probabilistic graphical 
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models and first order logic, assigning weights to logic formulas ll39l . Bayesian Logic relaxes the unique name 
constraint of first-order probabilistic languages to provide a compact representation of distributions over varying sets 
of objects 8361 . Other examples include independent choice logic 8381 . PRISM lETl . probabilistic first-order logic l22l . 
first-order relational POMDPs l26ll4T l. and Plog that assigns probabilities to different possible worlds represented as 
answer sets of ASP programs 12|33l. Despite significant prior research, knowledge representation and reasoning 
for robots collaborating with humans continues to present many open problems. Algorithms based on first-order 
logic do not support non-monotonic logical reasoning, and do not provide the desired expressiveness for capabilities 
such as default reasoning—it is not always possible to express degrees of belief and uncertainty quantitatively, e.g., 
by attaching probabilities to logic statements. Other algorithms based on logic programming do not support one 
or more of the capabilities such as reasoning about relations as in causal Bayesian networks; incremental addition of 
probabilistic information; reasoning with large probabilistic components; or dynamic addition of variables to represent 
open worlds. Our prior work has developed architectures that support different subsets of these capabilities. For 
instance, we developed an architecture that coupled planning based on a hierarchy of POMDPs 8471 [52l with ASP- 
based inference. The domain knowledge included in the ASP knowledge base of this architecture was incomplete and 
considered default knowledge, but did not include a model of action effects. ASP-based inference provided priors 
for POMDP state estimation, and observations and historical data from comparable domains were considered for 
reasoning about the presence of target objects in the domain l53l . Building on recent work 844115X1 , this paper describes 
a general refinement-based architecture for knowledge representation and reasoning in robotics. The architecture 
enables robots to represent and reason with descriptions of incomplete domain knowledge and uncertainty at different 
levels of granularity, tailoring sensing and actuation to tasks to support scaling to larger, complex domains. 

4 Design Methodology 

Our proposed architecture is based on a design methodology. A designer following this methodology will: 

1. Provide a coarse-resolution description of the robot’s domain in action language AL,/ together with the descrip¬ 
tion of the initial state. 

2. Provide the necessary domain-specific information for, and construct and examine correctness of, the fine- 
resolution refinement of the coarse-resolution description. 

3. Provide domain-specific information and randomize the fine-resolution description of the domain to capture the 
non-determinism in action execution. 

4. Run experiments and collect statistics to compute probabilities of the outcomes of actions and the reliability of 
observations. 

5. Provide these components, together with any desired goal, to a reasoning system that directs the robot towards 
achieving this goal. 

The reasoning system implements an action loop that can be viewed as an interplay between a logician and statistician 
(Section [I] and Section [9]>. In this paper, the reasoning system uses ASP-based non-monotonic logical reasoning, 
POMDP-based probabilistic reasoning, models and descriptions constructed during the design phase, and records 
of action execution and observations obtained from the robot. The following sections describe components of the 
architecture, design methodology steps, and the reasoning system. We first define some basic notions, specifically 
action description and domain history, which are needed to build mathematical domain models. 

5 Action Language and Histories 

This section first describes extensions to action language ALj to support non-boolean fluents and non-deterministic 
causal laws (Section [5Tj ). Next, Section [572] expands the notion of the history of a dynamic domain to include initial 
state defaults, defines models of such histories, and describes how these models can be computed. The subsequent 
sections describe the use of these models (of action description and history) to provide the coarse-resolution description 
of the domain, and to build more refined fine-resolution models of the domain. 
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5.1 AL d with non-boolean fluents and non-determinism 

Action languages are formal models of parts of natural language used for describing transition diagrams. In this paper, 
we extend action language AL d Banana (we preserve the old name for simplicity) to allow non-boolean fluents 
and non-deterministic causal laws. A system description of AL C \ consists of a sorted signature containing a collection 
of basic sorts organized into an inheritance hierarchy, and three special sorts: statics, fluents and actions. Statics 
are domain properties whose truth values cannot be changed by actions (e.g., locations of walls and doors), fluents 
are properties whose values can be changed by actions (e.g., location of the robot), and actions are sets of elementary 
actions that can be executed in parallel. Fluents of AL C \ are divided into basic and defined. The defined fluents are 
boolean, do not obey laws of inertia, and are defined in terms of other fluents, whereas basic fluents obey laws of inertia 
(thus often called inertial fluents in the knowledge representation literature) and are directly changed by actions. There 
are two types of basic fluents. The first one is related to the physical properties of the domain—fluents of this type 
can be changed by actions that change the physical state. The second one, called a basic knowledge fluent is changed 
by knowledge producing actions that only change the agent’s knowledge about the domain. Atoms in AL d are of the 
form fix) = y, where y and elements of x are variables or properly typed object constant^]— when convenient, we 
also write this as /( x,y). If / is boolean, we use the standard notation f(x) and ~^f(x). Literals are expressions of 
the form f(x) = y and fix) A Y- For instance, in our example domain (Example [TJ, static next Jo (PI \, Plf) says that 
places Pl\ and Ph are next to each other, basic fluent loc(Th) = PI says that thing Tli is located in place PI, basic 
fluent inJiand(R,Ob) says that robot R is holding object Ob, and action move (R, PI) moves robot R to place PI. 

ALj allows five types of statements: deterministic causal laws, non-deterministic causal laws, state constraints, 
definitions, and executability conditions. Deterministic causal laws are of the form: 

a causes f(x) = y if body (1) 

where a is an action, / is a basic fluent, and body is a collection of literals. Statement|T]says that if a is executed in a 
state satisfying body, the value of / in any resulting state would be y. Non-deterministic causal laws are of the form: 

a causes fix) : {Y : p(Y )} if body (2) 

where p is a unary boolean function and p(Y) is a boolean literal, or: 

a causes f(x) : sortmame if body (3) 

where Statement [2] says that if a were to be executed in a state satisfying body, f may take on any value from the set 
{Y : p(Y)}C\range(f) in the resulting state. Statement[3]says that / may take any value from {sort_nameC\ range (/)}. 
If the body of a causal law is empty, the if will be omitted. In the context of Example[l] the deterministic causal law: 

move(R,Pl) causes loc(R) = PI 

says that a robot R moving to place PI will end up in PI. Examples of other forms of the causal law are provided later. 
State constraints are of the form: 

fix) =y if body (4) 

where / is a basic fluent or static. The state constraint says that f(x) = y must be true in every state satisfying body. 
For instance, the constraint: 


loc(Ob)=Pl if Ioc(R)=PI, inJiand(R,Ob) 

guarantees that the object grasped by a robot shares the robot’s location. 

The definition of a defined fluent f(x) is a collection of statements of the form: 

f{x) if body (5) 

2 This representation of relations as functions, in the context of ASP, is based on prior work 0. 
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As in logic programming definitions, fix) is true if it follows from the truth of at least one of its defining rules. 
Otherwise, f{x) is false. 

Executability conditions are statements of the form: 

impossible ao, 7 ak if body (6) 

which implies that in a state satisfying body, actions a {] .... a/, cannot be executed simultaneously. For instance, the 
following executability condition: 


impossible move(R,Pl ) if loc(R)=Pl 

implies that a robot cannot move to a location if it is already there. 

A collection of statements of ALj forms a system description 2>. The semantics of is given by a transition 
diagram %(3>) whose nodes correspond to possible states of the system. A diagram contains an arc (d|, a, of) if, after 
the execution of action a in a state d], the system may move into state Ob. We define the states and transitions of t{@) 
in terms of answer sets of logic programs, as described below; see mm for more details. 

Recall that an interpretation of the signature of is an assignment of a value to each fix) from the signature. An 
interpretation can be represented by the collection of atoms of the form f{x) = y, where y is the value of fix). For any 
interpretation d, let a" d denote the collection of all atoms of d formed by basic fluents and statics— ”nd” stands for 
non-defined. Also, let n c (ff), where c stands for constraints, denote the logic program defined as follows: 

1. For every state constraint (Statement [ 4 J 1 and definition (Statement|5]l, program n c (@>) contains: 

fix) =yf- body 

2. For every defined fluent /, II C (@) contains the closed world assumption (CWA): 

1 f{x) ■<— body , not fix) 

where, unlike classical negation “-1 a” that implies “a is believed to be false”, default negation “not a ” only 
implies that “a is not believed to be true”. 

We can now define states of t(^). 

Definition 1. [State of t(^)7 

An interpretation d is a state of the transition diagram tiff ) if it is the unique answer set of program n c (ff) U <7 nd . 

The uniqueness of an answer set is guaranteed for a large class of system descriptions that are well-founded. Although 
well-foundedness is not easy to check, the broad syntactic condition called weak-acyclicity, which is easy to check, 
is a sufficient condition for well-foundedness El. All system descriptions discussed henceforth in this paper are 
assumed to be well-founded. 

Our definition of transition relation of z(ff) is also based on the notion of the answer set of a logic program. 
Definition 2. [Transition of tiff) j 

To describe a transition (ab,cz, CTj), we construct a program niff. do,a) consisting of: 

• Logic programming encoding niff) of system description ff. 

• Initial state do- 

• Set of actions a. 

The answer sets of this program determine the states the system can move into after the execution of a in do- The 
encoding Yliff ) of system description consists of the encoding of the signature of ff) and rules obtained from 
statements of ff, as described below. 
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• Encoding of the signature: we start with the encoding sig(£>) of signature of fi. 

- For each basic sort c, sig(£>) contains: sort _name(c). 

- For each subsort link (ci,C 2 ) of the hierarchy of basic sorts, sig(S>) contains: sJink(c\,C 2 ). 

- For each membership link (x,c) of the hierarchy, sig(@) contains: mJink(x,c). 

- For every function symbol /: c\ x ... c n —>■ c, the signature sig(£>) contains the domain: dom(f, c \,..., c„), 
and range: range(f ,c). 

- For every static g of fft, sig(S>) contains: static(g). 

- For every fix) where / is a basic fluent, sig(S>) contains: fluent (basic, / (x)). 

- For every f(x) where / is a defined fluent, sig(St) contains: fluent(defined,f(x)). 

- For every action a of ff), sig(2>) contains: action(a). 

We also need axioms describing the hierarchy of basic sorts: 

subsort(Ci,C 2 ) 4— sJink(C\ 1 C 2 ) 
subsort(Ci^Ci) y— sJink(C\,C), subsort(C ,C 2 ) 
member(X,C ) 4 — mJink(X,C) 
member(X,C\) 4— mJink(X ,Co), subsort (Co, Ci) 

• Encoding of statements of S': for this encoding we need two steps that stand for the beginning and the end 
of a transition. This is sufficient for describing a single transition; however, we later describe longer chains of 
events and let steps range over [0,«] for some constant n. To allow an easier generalization of the program, we 
encode steps by using constant n for the maximum number of steps, as follows: 

step(0..n) 

where n will be assigned a specific value based on the goals under consideration, e.g., #const n = 3. We also need 
a relation val(f(x \,... ,x„),y, i), which states that the value of f(x \,... ,x„) at step ; is y; and relation occurs(a, i), 
which states that compound action a occurred at step i, i.e., occurs({ao,... ,ak},i) =def {occurs(ai): 0 < i < k}. 
We use this notation to encode statements of ff as follows: 

- For every causal law (Statements [2|3]l, where the range of / is {y 1 ,...,yy}, n(S>) contains a rule: 

val(f(x),yi,I+ 1) or .. .or val(f(x),y/ ( ,I+ 1) 4—val (body, I), occurs(a,I ) ,I<n 

where val (body,I) is obtained by replacing every literal f n (x m ) = z from body by val(f m (x m ),z,I). To 
encode that due to this action, f(x) only takes a value that satisfies property p, IT(^) contains a constraint: 

val(f(x),Y,I+ 1), not val(p(Y),true,I) 

and rules: 

satisfied(p,I) y— val(p(Y),true,I) 

~^occurs(a,I) y— not satisfied(p,I) 

- For every state constraint and definition (Statements [4] [5j», Wi/X) contains: 

val(f(x),y,I) y— val (body, I) 

- U(9) contains the CWA for defined fluents: 

val(F,false,I ) y— fluent(defined,F), not val(F,true,I) 
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- For every executability condition (Statement^, TL(0) contains: 

~^occurs(ao,I) or ... or —>occurs(ak,I) <^val(body,I), I < n 

- Yl(0) contains the Inertia Axiom: 

val(F,Y,I + 1) <^fIuent(basic,F), 

val(F,Y,I ), not ->val(F,Y,I + 1), I <n 

- n(S>) contains CWA for actions: 

-^occurs (A, I) £- not occurs(A,I), I < n 


- Finally, we need the rule: 

->val(F,Yi,I) <- val(F,Y 2 ,I ), ?i ± Y 2 

which says that a fluent can only have one value at each time step. 

This completes the construction of encoding Yl(0) of system description 0. Please note that the axioms described 
above are shorthand for the set of ground instances obtained from them by replacing variables by (the available) ground 
terms from the corresponding sorts. 

To continue with our definition of transition ((Jo.a.O'i), we describe the two remaining parts of program 11(0, <7o, a), 
the encoding val(Go,0) of initial state Go, and the encoding occurs(a,0) of action a: 

val (go, 0) =def {val(f(x),y, 0) : (f(x)=y) £ G 0 } 
occurs(a, 0) =def {occurs(ctj, 0) : a,- £ a} 

To complete program YA(0, Go, a), we simply gather our description of the system’s laws, together with the description 
of the initial state and the actions that occur in it: 

H(0,G O ,a) =def Yl(0) U val (Go, 0) U occurs(ci,0) 

Now we are ready to define the notion of transition of z(0). Let a be a non-empty collection of actions, and Co and 
CTj be states of the transition diagram z(0) defined by a system description 0. A state-action-state triple (Go, a, G\) is 
a transition of z(0) iff Tl(0, Go, a) has an answer set AS such that <7i = {f(x) = y : val(f(x),y, 1) £ AS}. 

5.2 Histories with defaults 

A dynamic domain’s recorded history is usually a collection of records of the form obs(L,I), which says that a literal 
is observed at step /, e.g., obs(loc(tb\) = off ice, 0) denotes the observation of textbook tb\ in the office —when 
convenient, this will also be written as obs(loc(tb\,off ice), true, 0); and hpd(action,step), which says that a specific 
action happened happen at a given step, e.g., hpd(move(rob\,kitchen), 1). In addition to the statements described 
above, we introduce an additional type of historical record: 

initial default f(x) = y if body (1) 

where f(x) is a basic fluent. We illustrate the use of such initial state defaults with an example, before providing a 
formal description. 

Example 2. [Example of defaults] 

Consider the following statements about the locations of textbooks in the initial state in our illustrative example. 
Textbooks are typically in the main library. If a textbook is not there, it is typically in the auxiliary library. If a 
textbook is checked out, it can usually be found in the office. These defaults can be represented as: 

initial default loc(X) = main .library if textbook(X ) % Default d\ (8) 
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initial default loc(X) = auxJibrary if textbook(X). % Default ch (9) 

loc{X) f main library 

initial default loc(X)=office if textbook(X), % Default ^3 (10) 

loc(X) ^ main-library, 
loc{X) 7 ^ auxJibrary 

where we use the fluent {loc: thing —> place}. Intuitively, a history Xf a with the above statements entails: val{loc{tb\) = 
main library, true, 0) for textbook tb\. History Xfj that adds obs{loc{tb\) 7 ^ main library, 0) as an observation 
to If a renders default d\ (Statement [ 8 } inapplicable; it entails: val{loc(tb\) = auxJibrary, true, 0) based on de¬ 
fault o ?2 (Statement [9ji. A history Jf c that adds observation: obs(loc{tb\) / auxJibrary, 0) to Jf}, should entail: 
val{loc{tb \) = off ice,true, 0). Adding observation obs{loc(tb \) 7 ^ mainlibrary, 1) to If, results in history Jf d that 
defeats default d\ because, if this default’s conclusion is true in the initial state, it is also true at step 1 (by inertia), which 
contradicts our observation. Default c/o will conclude that this book is initially in the auxJibrary, the inertia axiom will 
propagate this information to entail: val(loc(tb\) = auxJibrary,true, 1). Figure[2]illustrates the beliefs of a robot cor¬ 
responding to these four histories. Please see example2.sp at https://github.com/mhnsrdhrn/refine-arch 
for an example of the complete program in SPARC. 


Room 1 


Room 2 


Room 3 

Main library: mainjibrary 


Auxiliary library: auxJibrary 


Office: office 


? 


Textbook is typically in the main library: 

dl: initial default loc(X) = mainjibrary if textbook(X) 

If it is not there, it is in the auxiliary library: 

d2: initial default loc(X) = auxJibrary if textbook(X), 

loc(X) * mainjibrary 

If it is checked out, it is in the office: 

d3: initial default loc(X) = office if textbook(X), 

loc(X) * mainjibrary 
loc(X) * auxJibrary 



defaults 

observation at step 0 

observation at step 1 

believed location of tb ? 

H a 

dl, d2, d3 

0 

0 

main_library 

H b 

dl, dl, d3 

loc(tbl) t mainjibrary 

0 

auxJibrary 

H c 

dl, dl, d3 

loc(tbl) t mainjibrary 
locftbl) t auxJibrary 

0 

office 

H d 

dl, dl, d3 

0 

locftbl) t mainjibrary 

auxJibrary 


Figure 2: Illustration of the beliefs of a robot corresponding to the four histories (with the same initial state defaults) 
described in Example [2] 

The history Jf of the system will define collection of its models, i.e., trajectories of the system considered possible by 
the agent recording this history. To define such models, consider program n(fX, .xXj obtained by adding to nif/j): 
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• Record of observations and actions from 3YF. 

• Rules for every default (Statement[7]i: 


val(f(x),y,0 ) <^val(body,0 ), 

not -<val(f(x),y,0 ) 


( 11 ) 


val (f(x), Y, 0) i^val (body, 0), %CRrule (12) 

range(f ,C), 
member(Y,C), 

Y 

is-defined (f(x)) <—val (body, 0) 

where the second rule is a consistency restoring (CR) rule, which states that to restore consistency of the program 
one may assume that the conclusion of the default is not the expected one. For more details about CR rules, 
please see m. 

• A rule defining initial values of fluents through observations: 

isjdefined(f(x)) <— obs(f(x ) =y,0) (13) 


• A rule for every basic fluent: 

val(f(x),yi,0) or ... or val(f(x),y„, 0) <— not isjdefined(f(x)) (14) 


where {y t, • • •, y n } are elements in the range of / not occurring in the head of any initial default of This rule 
states that if a fluent is not defined in the initial state by the head of an initial default, or by an observation, it 
must take on some possible value from its range. 


• A reality check 0: 


<- val(F,Y u I ), obs(F = Y 2 ,I), Y l9 ^Y 2 
<-val(F,Y : i,/), obs(F^Y h I) 


(15) 


which states that an observation of a fluent shall return the fluent’s expected value. 

• And a rule: 

occurs(A,I) G- hpd(A,I) (16) 

To define a model of the history, we also need the following auxiliary definitions. 

Definition3. [Defined Sequence] 

We say that a set S of literals defines a sequence (<Jq, ciq, cti ,.. . ,a n ~\,C n ) if 

• For every 0 < i < n, ( f(x)=y) G <7; iff val(f(x),y,i)) G S. 

• For every 0 < i < n. e G a; iff occurs(e, i ) G S. 

Definition 4. [Compatible initial states] 

A state <7 of t(S>) is compatible with a description Y of the initial state of history M' if: 

• <7 satisfies all observations of -Y\ and 

• <7 contains the closure of the union of statics of ^ and the set {/ = y : obs(f = y,0) G J?} U {f f^y '■ obs(f 
V-OjC •/}. 


13 


Let .#/_ be the description of the initial state of history 3tf[. States in Example [2] compatible with J? a , .f c must 
then contain {loc{tb\) = main Jib rary }, {/oc(f£>i) = auxJibrary}, and {loc(tb\) = office} respectively. There are 
multiple such states, which differ by the location of robot. Since J/ a = J4, they have the same compatible states. 


Next, we define models of history Jff, i.e., paths of the transition diagram of 5? compatible with Jif. 

Definitions. [Models] 

A path M = (ob,ao, CTi,... - a n- \ , <7„) of t( 5?) is a model of history 34? of 44 with description J? of its initial state, if 
there is a collection E of obs statements such that: 

1. If obsff = y, 0 ) £ E then / f v is the head of one of the defaults of .4. Similarly, for obs(f f y. 0). 

2. The initial state of M is compatible with the description: J’e = U E. 

3. Path M satisfies all observations of fluents and action occurrences in M. 

4. There is no collection If of in it statements which has less elements than E and satisfies the conditions above. 

We will refer to E as an explanation of 34?. For example, consider the four histories described in Example[2] Models of 
34? a , M[, and are paths consisting of initial states compatible with J“ a , .4/,, and —the corresponding explanations 
are empty. However, in the case of the situation is different—the predicted location of tb\ will be different from 
the observed one. The only explanation of this discrepancy is that tb\ is an exception to the first default. Adding 
E = {obs{loc{tb\) ^ rnainJibrary, 0)} to J4 d will resolve this problem. 

We illustrate this definition by some more examples. 

Example 3. [Examples of Models] 

Consider a system description $) a with basic boolean fluents / and g and a history M a : 


initial default -g if / 


{/, -ig}, {-i/,g}, and {—if, are models of {3> a ,M a ) and a = {f,g} is not. The latter is not surprising since even 
though a may be physically possible, the agent, relying on the default, will not consider a to be compatible with the 
default since the history gives no evidence that the default should be violated. 

If the agent were to record an observation: obs(g,0), the only states compatible with the resulting history M] (i.e.,the 
models) would be {/, g} and {-/, g }. Next, we expand our system description by a basic fluent h and a state constraint: 

h if ~^g 

In this case, to compute models of a history M J C of a system 54, where consists of the default and an observation: 


obs(-ih, 0), we need CR rules (see second rule of Statement 11 1 . The models are {f,^h,g} and {~'f,-'h,g}. 
Next, consider a system description 54 with basic fluents /, g, and h, the initial default, action a, a causal law: 


a causes h if -g 

and a history consisting of obs(f,0 ), hpd(a,0 ); {{f,->g,h},a,{f,->g,h}) and {{f,->g,->h},a,{f,->g,h}) are the 
two models of Mj. Finally, history obtained by adding obs{Mi , 1) to 3% has a single model ({f,g,M},a, {f,g,h}). 
The new observation is an indirect exception to the initial default, which is resolved using the corresponding CR rule. 


Computing models of system histories: The definition of models of a history 3ff of a system (see Definition |5j) 
suggests a simple algorithm for computing a model of M. We only need to use an (existing) answer set solver to 
compute an answer set AS of a program n{2>, Mf), and check if AS defines a path of t(^). To do the latter, we need 
to check that for every 0 < i < n, the set: 


{f{x)=y- val{f{x),y,i) G AS} 
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is a state of x(@), and that triples (o/.a,-, c7, + i) defined by AS are transitions of x{£>). A triple (<7, ,a, C7,+i) is defined by 
AS if Oj = {fix) =y: val(f{x),y,i) £ AS}, <T !+ i = {/(x) =y : val{f{x),y,i + 1) £ AS}, and a,- = {e : occurs(e,i) £ AS}. 
Although this check can be done using the definitions of state and transition, the following theorem shows that for a 
large class of system descriptions, this computation can (fortunately) be avoided. 

Proposition 1. [Models and Answer Sets] 

A path M = (ao,ao, cti,..., <5 n ~\,a n ) of x{$>) is a model of history iff there is an answer set AS of a program 
n(^,jT) such that: 

1. A fluent literal (/ = y) £ ct/ iff val(f,y, i) G AS, 

2. A fluent literal (/ ^ yi) G <7, iff val{f,y 2 ,i)&AS, 

3. An action e G a,- iff occurs{e , i) G AS. 

The proof of this proposition is in Appendix[A] This proposition allows us to reduce the task of planning to computing 
answer sets of a program obtained from Tl{!2>,Ji?) by adding the definition of a goal, a constraint stating that the 
goal must be achieved, and a rule generating possible future actions of the robot. In other words, the same process of 
computing answer sets can be used for inference, planning and diagnostics. 

6 Logician’s Domain Representation 

We are now ready for the first step of our design methodology, i.e., specify the transition diagram of the logician. 


1. Specify the transition diagram, x h, which will be used by the logician for high-level reasoning, including 
planning and diagnostics. 


This step is accomplished by providing the signature and ALj axioms of system description S>h defining this diagram. 
We will use standard techniques for representing knowledge in action languages, e.g., ED. We illustrate this process 
by describing the domain representation for the office domain described in Example [TJ. 

Example 4. [Logician's domain representation ] 

The system description S>h of the domain in Example |2]consists of a sorted signature (E h) and axioms describing the 
transition diagram T#. E// defines the names of objects and functions available for use by the logician. As described 
in Example [T] the sorts are: place, thing, robot, and object, with object and robot being subsorts of thing. The sort 
object has subsorts such as: textbook, printer and kitchenware. The statics include a relation next Jo{place,place), 
which describes if two places are next to each other. The signatures of the fluents of the domain are: loc : thing place 
and inJiand : robot x object —> boolean. The value of l.oc{Th) = PI if thing Th is located at place PI. The fluent 
in-hand ( R, Ob) is true if robot R is holding object Ob. These are basic fluents subject to the laws of inertia. The domain 
has three actions: move{robot,place), grasp{robot,object), and putdown{robot , object). The domain dynamics are 
defined using axioms that consist of causal laws: 

move(R,Pl) causes loc{R)=Pl (17) 

grasp(R,Ob) causes in_hand{R,Ob) 
putdown{R,Ob) causes -iinJiand{R,Ob) 

state constraints: 

loc{Ob)=Pl if loc(R)=Pl, inJiand{R,Ob) (18) 

nextJo{Pl,P2) if nextJo{P2,P\) 

and executability conditions: 

impossible move{R,Pl) if loc{R)=Pl (19) 

impossible moveiR,Ph) if loc{R) = Pl\, -next Jo(PI \. Ph) 
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Coarse move(rob 1, kitchen) 

resolution A _ 

loc(robl) = office loc(robl) = kitchen 


move(robl, office) 


Fine 

resolution r 


rl (office) 


loc(robl) = cl 

l 


move (rob 1, cl) 

▼ 

loc(robl) = c2 


r2 (kitchen) 

loc(robl) = c5 

r a 


movefrol>1, cl) move( rob 1, c6) 

move(robl, c6) 


movefrobl, c5) 


move(robl, c2) 


loc(robl) = c6 



r2 


Wall 


r4 


(a) Some state transitions 

Figure 3: (a) Illustration of state transitions for specific move actions in our illustrative (office) domain, viewed at 
coarse resolution and at fine resolution; and (b) A closer look at specific places brings into focus the corresponding 
rooms and grid cells in those rooms. 


impossible A\, A 2 if A\ 7^2 

impossible grasp(R,Ob) if loc(K) =Pl\, loc(Ob) = Ph. Pl\ / PI 2 
impossible grasp(R,Ob) if inJiand(R,Ob) 
impossible putdown(R,Ob) if -1 in-hand (R, Ob) 


The part of E h described so far, the sort hierarchy and the signatures of functions, is unlikely to undergo substantial 
changes for any given domain. However, the last step in the constructions of E h is likely to undergo more frequent 
revisions—it populates the basic sorts of the hierarchy with specific objects; e.g robot = {rob 1 }, place = {h,. ■ ■ ■ } 

where rs are rooms, textbook = {tb \,... tb m }, kitchenware = {cup\,cup 2 ,plate\ 1 plate 2 } etc. Ground instances of 
the axioms are obtained by replacing variables by ground terms from the corresponding sorts. 

The transition diagram T h described by 2>h is too large to depict in a picture. The top part of Figure 3(a) shows 
the transitions of Th corresponding to a move between two places. The only fluent shown there is the location of 
the robot rob\ —the values of other fluents remain unchanged and are not shown here. The actions of this coarse- 
resolution transition diagram T// of the logician, as described above, are assumed to be deterministic, and the values 
of its fluents are assumed to be observable. These assumptions allow the robot to do fast, tentative planning and 
diagnostics necessary for achieving its assigned goals. 

The domain representation described above should ideally be tested extensively. This can be done by including 
various recorded histories of the domain, which may include histories with prioritized defaults (Example |2}, and using 
the resulting programs to solve various reasoning tasks. 


The logician’s model of the world thus consists of the system description (Example [4), initial state defaults (Exam¬ 
ple [2), and recorded history of actions and observations. The logician achieves any given goal by first translating the 
model (of the world) to an ASP program—see Sections [5.1||5.2| For planning and diagnostics, this program is passed 
to an ASP solver—we use SPARC, which expands CR-Prolog and provides explicit constructs to specify objects, re¬ 
lations, and their sorts Cl. Please see example4. sp at https ://github. com/mhnsrdhrn/ref ine-arch for the 
SPARC version of the complete program. The solver returns the answer set of the program. Atoms of the form: 

occurs (act ion, step) 

belonging to this answer set, e.g., occurs(a \, 1 ),... ,occurs(a n ,n), represent the shortest sequence of abstract actions, 
i.e., the shortest plan for achieving the logician’s goal. Prior research results in the theory of action languages and 
ASP ensure that the plan is provably correct IT8l . In a similar manner, suitable atoms in the answer set can be used for 
diagnostics, e.g., to explain unexpected observations in terms of exogenous actions. 
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7 Refinement, Zoom and Randomization 


For any given goal, each abstract action in the plan created by reasoning with the coarse-resolution domain representa¬ 
tion is implemented as a sequence of concrete actions by the statistician. To do so, the robot probabilistically reasons 
about the part of the fine-resolution transition diagram relevant to the abstract action to be executed. This section 
defines refinement, randomization, and the zoom operation, which are necessary to build the fine-resolution models 
for such probabilistic reasoning, along with the corresponding steps of the design methodology. 

7.1 Refinement 

The second step of the design methodology corresponds to the construction of a fine-resolution transition diagram T/, 
from the coarse-resolution transition diagram t h- We illustrate this step by constructing the signature of 1 / and the 
axioms of its system description for the office domain of Example[l] This construction is not entirely algorithmic— 
although the signature of T h and its axioms in Example[4]will play an important role, the construction will also depend 
on the result of the increase in the resolution, which is domain dependent. Note, however, that any input provided by 
the designer is during the initial design phase—at run-time, all steps of planning and execution are algorithmic and 
automated. We begin with some terminology. 

If examining an object of E h at higher resolution leads to the discovery of new structure(s), the object is said to have 
been magnified. Newly discovered parts of the magnified objects are referred to as its refined components. We can 
now construct the signature of tl- 


2. Constructing the refinement T/_ of t h- 
(a) Signature E i of T/_ 


A signature E i is said to refine signature E h if: 

• For every basic sort sIh of E#, whose elements were magnified by the increase in the resolution, E/ contains 
a (a) coarse-resolution version stf = st// ; and (b) fine-resolution version sir = {oi,...,o m } consisting of the 
components of magnified elements of stn ■ We also refer to the fine-resolution version of an original sort as its 
fine-resolution counterpart. 

For instance, for the sort place = {r \,..., r n } in E# (representing rooms), we have the coarse-resolution copy: 

place* = {ri,...,r„} 

and its fine-resolution counterpart: 

place = {ci,...,c m } 

where ci,... ,c m are newly discovered cells in the rooms. Although the original version and its fine-resolution 
counterpart have the same name, it does not cause a problem because they belong to different signatures. Instead, 
it proves to be convenient for the construction of axioms of £X/- Also, for basic sort stn of E h whose elements 
were not magnified by the increase in resolution, E i contains st/ = stu. 

• E/ contains static relation coniponent(oj. o). which holds iff object o, £ st/ is a newly discovered component of 
magnified object o from stf. 

Continuing with Example |T| and Example [4j we have: 

component: place x place* —¥ boolean 
where component (c,r) is true iff cell c is part of room r. 

• For every function symbol / : st \,..., st n —> sto of E#: 
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- If signature of / contains magnified sorts, Li contains function symbol f* : st[,..., st' n — > st' 0 , where st\ = 
st* if sti is magnified and stj = stj otherwise. 

- / : st i,..., st„ —» sto is a function symbol of Li. 

In our example, E/ will include, for instance: 

loc* : thing —> place* 


and 

loc : thing —> place 

Although the second fluent looks identical to its coarse-resolution counterpart in E//, the meaning of place in it 
is different—elements of sort place are rooms in E//, but are cells in E/ . In a similar manner, E/ will include 
both next Jo{place, place) and next Jo* (place*. place*) —the former describes two cells that are next to each 
other while the latter describes two rooms that are next to each other. 

• Actions of Li include (a) every action in E// with its magnified parameters replaced by their fine-resolution 
counterparts; and (b) knowledge-producing action: 

test {robot , fl uent , value) 

which activates algorithms on the robot R to check if the value of an observable fluent F in a given state is Y. 
Note that this action only changes basic knowledge fluents (see below). 

In our example, the sort action of Li will have (a) original actions grasp and putdown\ (b) action nwve(robot .cell) 
of a robot moving to an (adjacent) cell; and (c) test action for testing the values of each observable fluent, e.g., 
instances of test{R 1 loc{Th),C) and test(R,inJiand(R,0),true) for specific cells and objects. 

• Li includes basic knowledge fluents directly -observed, indirectly-observed and cari-be Jested, and defined 
fluents may-discover and observed. These fluents are used to describe observations of the environment and the 
axioms governing them—details provided below in the context of the axioms in S>i. 


2. Constructing the refinement T i of T//. 
(b) Axioms of FYi 


Axioms of the refined system description $*i include: 

• All axioms of 2>h- 

Although this set of axioms is syntactically identical to the axioms of ®H, their ground instantiations are dif¬ 
ferent because their variables may range over different sorts. Continuing with our example of refining the 
description in Example]?] while variables for places were ground using names of rooms in S>h, they are ground 
by names of cells in 3>i. There are also differences in the definition of statics, as discussed further below. 

• Axioms relating coarse-resolution domain properties and their fine-resolution counterparts. Assuming that the 
only sort of the signature of the original fluent / influenced by the increase in resolution is its range, the axiom 
has the form: 

= Y \l component{C\,Xi), component{C m ,X m ), component{C,Y), (20) 

/(Ci,...,C m ) =c 

where, for non-magnified object O, component(0,0) holds true. In our example, we have: 
loc*(Th) = Rm if component{C,Rm), loc(Th) = C 

next Jo* (Rm\, Rim) if component {Ci,Rm\), component {C 2 ,Rni 2 ), next Jo{C\,C 2 ) 
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In general, we need to add component (Cj,Xj) to the body of the rule for those X/ that were affected by the 
increase in resolution, and change the value of the domain property appropriately in the head and the body of 
the rule. A key requirement is that the head of the axiom holds true in the coarse-resolution system description 
if and only if the body of the axiom holds in the fine-resolution (i.e., refined) system description. 

• Axioms for observing the environment. 

In addition to actions that change domain properties, the refinement includes actions and fluents for observing the 
environment. As stated above, the values of fluents of bfc will be determined by action test (robot, fl uent, value). 
Even though the second parameter of this action is an observable fluent, its value may not always be testable by 
a sensor, e.g., we assume that a robot cannot check if an object is located in a cell unless the robot is also located 
in that cell. This is represented by a domain-dependent basic knowledge fluent: 

can_beJested : robot x fluent x value —> boolean 

whose definition is supplied by the designer. In our example, the corresponding axioms may be: 

canJbeJested(R,loc(Th),C) if loc(R)=C 
can _be Jested (R , in Jiand ( R,0),V ) 

Next, to model the results of sensing, we use another basic knowledge fluent: 

directly-observed : robot x fluent x value —> outcomes 


outcomes = {true, false ,undet} 

is a sort with three possible values true, false, and undetermined. The initial value of directly observed, for all 
its parameters, is undet. The direct effect of test(R,F,Y) is then described by the following causal laws: 

test(R,F,Y) causes directlyjobserved(R,F,Y) = true if F = Y. (21) 

test(R,F,Y) causes directlyjobserved(R, F,Y) = false if F f Y. 

and the executability condition: 

impossible test(R,F,Y) if — can Jie Jested(R. F. Y) (22) 

In the context of the refinement of Example [4] if a robot rob\ located in cell c is checking for an object o, 
directly _observed(rob\ ,loc(o) ,c) will be true iff o is in c during testing; it will be false iff o is not in c. These 
values will be preserved by inertia until the state is observed to have changed when the same cell is tested again. 
If robot rob i has not yet tested a cell c for object o, the value of directly_observed(r, loc(o),c) will remain undet. 

In addition to directly observing the fine-resolution fluents using action test, the robot should be able to test the 
values of observable coarse-resolution fluents inherited from £//. For instance, although rob \ cannot directly 
observe if object o is in a room r, it can do so indirectly if o is observed in cell c of room r—o is indirectly 
observed to not be in r if all the cells in r have been examined without observing o. Such a relationship is 
assumed to hold for all magnified fluents. This assumption is axiomatized using a basic knowledge fluent: 

indirectly ^observed : robot x fluent* x value* —> outcomes 

which will initially be set to undet, and a defined fluent: 

may .discover : robot x fluent* x value* —> boolean. 
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where may_discover{R,F*,Y ) holds if robot R may discover if the value of F*, copy of a coarse-resolution fluent 
whose value is determined by the value of its components, is Y. The axioms for an indirect observation are: 

indirectly_observed{R,F*,Y) = true if directlyj?bserved(R,F,C) — true, (23) 

component (C, 7) 

indirectly ^observed [R,F* ,Y) = false if indirectly-observed {R,F* ,Y) ^ true , 

-i may _discover(R , F *, Y) 

may_discover(R,F*,Y) if indirectly -observed (R,F* ,F) f=true, 

component (C ,7), 

directly-observed {R,F,C) = undet 

where F and F* are the fine-resolution counterpart and the coarse-resolution version (respectively) of a fluent 
in E h that has been magnified. 

In our example, the axioms for fluent loc will look as follows: 

indirectly-observed{R,loc*(O).Room) =true if directly-observed(Rfoc(0),C) = true, 

component (C, Room). 

indirectly _observed{R Joe* (O ), Room) = false if indirectly -observed [R ,loc* (O ), Room) f^true, 

~^may-discoverfR, loc* (O) , Room). 

may -discover(R foe* (O), Room) if indirectly -observed {R ,loc* (O ), Room) f^true, 

component (C, Room), 

directly-observed(R,loc{0),C) = undet. 

Finally, a defined fluent is used to say that any fluent observed directly or indirectly has been observed: 

observed : robot x fluent x value —> boolean 

where observed{R,F,Y) is true iff the most recent observation of F returned value Y. The following axioms 
relate this fluent with the basic knowledge fluents defined earlier: 

observed(R,F,Y) = true if directlyjobserved{R,F,Y) =true (24) 

observed(R,F,Y) = true if indirectly -observed (R,F,Y) = true 

This completes our construction of S>l. We are now ready to provide a formal definition of refinement. 

Definition 6. [Refinement of a state] 

A state 8 of t l is said to be a refinement of a state a of T h if: 

• For every magnified domain property / that is from the signature of E h- 

f{x) = y G cr iff f*( x )=y&8 

• For every other domain property of E h- 

f{x) =y£c iff f{x) = y e <5 
Definition 7. [Refinement of a system description ] 

Let and S>h be system descriptions with transition diagrams T/_ and zR respectively. ^ is a refinement of if: 

1. States of Zl are the refinements of states of T//. 
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2. For every transition (G\ .a 11 . Gi) of T//, every fluent / in a set F of observable fluents, and every refinement <5| 
of (7 1 , there is a path P in T/ from d| to a refinement 82 of os such that: 

(a) Every action of F is executed by the robot which executes a H . 

(b) Every state of P is a refinement of G\ or (72- i.e., no unrelated fluents are changed. 

(c) observed{R,f,Y) = true £ 5i iff (/ = Y) £ 82 and observed(R,f,Y) = false £ 82 iff (/ f=Y) £ 82 - 

Proposition 2. [Refinement] 

Let ]/Ju and be the coarse-resolution and fine-resolution system descriptions for the office domain in Example |T| 
Then PYSl is a refinement of 3>h- 

The proof of this proposition is in Appendix|B| As stated in Section|4] it is the designer’s responsibility to establish that 
the fine-resolution system description of any given domain is a refinement of the coarse-resolution system description. 

7.2 Randomization 

The system description of transition diagram T/ , obtained by refining transition diagram T//, is insufficient to 
implement a coarse-resolution transition T = (G\ .a 11 . of) £ T//. We still need to capture the non-determinism in 
action execution and observations, which brings us to the third step of the design methodology (Section[4]). 

3. Provide domain-specific information and randomize the fine-resolution description of the domain to capture 
the non-determinism in action execution. 


This step models the non-determinism by first creating 2>lr, the randomized fine-resolution system description, by: 

• Replacing each action’s deterministic causal laws in by non-deterministic ones; and 

• Modifying the signature by declaring each affected fluent as a random fluent, i.e., define the set of values the 
fluent can choose from when the action is executed. A defined fluent may be introduced to describe this set of 
values in terms of other variables. 

For instance, consider a robot moving to a specific cell in the office. During this move, the robot can reach the desired 
cell or one of the neighboring cells. The causal law for the move action in &r can therefore be (re)stated as: 

move(R,C 2 ) causes loc(R) = {C: range(loc(R),C)} (25) 

where the robot can only move to a cell that is next Jo its current cell location: 

impossible move{R,C 2 ) if loc(R) = C\. -nextJo(C\ Xfi) (26) 

and the relation range is a defined fluent that is given by: 

range{loc{R),C) if loc(R)=C 
range(loc(R),C) if loc(R) = C 1 , nextJo(C,C\) 

where the cell the robot is currently in, and the cells next to this cell, are all within the range of the robot. In addition, 
the fluent affected by the change in this causal law is declared as a random fluent and its definition is changed to: 

loc(X) = {Y : p(Y)} 

where a thing’s location is one of a set of values that satisfy a given property—in the current example that considers 
a robot’s location, this property is range as described above. In a similar manner, the non-deterministic version of the 
test action used to determine the robot’s cell location in the office, is given by: 

test(robi,loc(robi),Ci) causes directlyuobserved(rob\ 1 loc(robi),ci) = {true,false} if loc(rob\)=Ci 

which indicates that the result of the test action may not always be as expected, and c, are cells in the office. Similar 
to refinement, it is the designer’s responsibility to provide domain-specific information needed for randomization. 
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Collecting statistics: Once the fine-resolution system description has been randomized, experiments are run and 
statistics are collected to compute the probabilities of action outcomes and the reliability of observations. This corre¬ 
sponds to the fourth step of the design methodology (Section[4|. 


4. Run experiments, collect statistics, and compute probabilities of action outcomes and the reliability of obser¬ 
vations. 


Specifically, we need to compute the: 

• Causal probabilities for the outcomes of actions; and 

• Quantitative model for observations, which provides the probability of the observations being correct. 

This collection of statistics is typically a one-time process performed in an initial training phase. Also, the statistics 
are computed separately for each basic fluent in 3>lr- To collect the statistics, we consider one non-deterministic 
causal law in 2>lr at a time. We sample some ground instances of this causal law, e.g., corresponding to different 
atoms in the causal law. The robot then executes the action corresponding to this sampled instance multiple times, and 
collects statistics (i.e., counts) of the number of times each possible outcome (i.e., value) is obtained. The robot also 
collects information about the amount of time taken to execute each action. 

As an example, consider a ground instance of the non-deterministic causal law for move , considering specific 
locations of a robot in a specific room: 

move(rob\,C 2 ) causes loc(R) = {ci,C 2 ,C 3 } 

where rob\ in cell c\ can end up in one of three possible cells when it tries to move to t' 2 - In ten attempts to move to 
C 2 , assume that rob \ remains in c\ in one trial, reaches C 2 in eight trials, and reaches C 3 in one trial. The maximum 
likelihood estimates of the probabilities of these outcomes are then 0 . 1 , 0.8 and 0.1 respectively—the probability 
of rob\ moving to other cells is zero. Similar statistics are collected for other ground instances of this causal law, 
and averaged to compute the statistics for the fluent loc for rob\. The same approach is used to collect statistics for 
other causal laws and fluents, including those related to knowledge actions and basic knowledge fluents. For instance, 
assume that the collected statistics indicate that testing for the presence of a textbook in a cell requires twice as much 
computational time (and thus effort) as testing for the presence of a printer. This information, and the relative accuracy 
of recognizing textbook and printers, will be used to determine the relative value of executing the corresponding test 
actions (see Section [872] >. 

There are some important caveats related to the collection of statistics. First, the collection of statistics depends on 
the availability of relevant ground truth information, e.g., the actual location of rob\ after executing move(rob\ . 02 )— 
this information is provided by an external high-fidelity sensor during the training phase, or by a human. Second, 
although we do not do so in our experiments, it is possible to use heuristic functions to model the computational effort, 
and to update the statistics incrementally over time—if any heuristic functions are used, the designer has to make them 
available to automate subsequent steps of our control loop. Third, considering all ground instances of one causal law 
at a time can require a lot of training in complex domains, but this is often unnecessary. For instance, it is often the 
case that the statistics of moving from a cell to one of its neighbors is the same for cells in a room and any given robot. 
In a similar manner, if the robot and an object are (are not) in the same cell, the probability of the robot observing 
(not observing) the object is often the same for any cell. The designer thus only considers representative samples of 
the distinct cases to collect statistics, e.g., statistics corresponding to moving between cells will be collected in two 
different rooms only if these statistics are expected to be different because the rooms have different floor coverings. 

7.3 Zoom 

Reasoning probabilistically about the entire randomized fine-resolution system description can become computation¬ 
ally intractable. For any given transition T = (C\. a u . af £ T//, this intractability could be offset by limiting fine- 
resolution probabilistic reasoning to the part of transition diagram Trr whose states are the refinements of G\ and <Ji. 
For instance, for the state transition corresponding to a robot moving from the office to the kitchen in Example[4] i.e.. 
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a H = move(rob\,kitchen), we could only consider states of z lr in which the robot’s location is a cell in the office 
or the kitchen. However, these states would still contain fluents and actions not relevant to the execution of a H , e.g., 
locations of domain objects, and the grasp action. What we need is a fine-resolution transition diagram Zlr(T) whose 
states contain no information unrelated to the execution of a H , while its actions are limited to those which may be 
useful for such an execution. In the case of a H = move(rob \, kitchen), for instance, states of z lr(T ) should not contain 
any information about domain objects. In the proposed architecture, the controller constructs such a zoomed fine- 
resolution system description S>lr{T) in two steps. First, a new action description is constructed by focusing on the 
transition T, creating a system description ^u(T) that consists of ground instances of S>h built from object constants 
of E h relevant to T. In the second step, the refinement of 3>h(T) is extracted from SIrr to obtain @lr{T). We first 
consider the requirements of the zoom operation. 

Definition 8. [Requirements of zoom] 

The following are the requirements the zoom operation should satisfy: 

1. Every path in the zoomed transition diagram should correspond to a path in the transition diagram before zoom¬ 
ing. In other words, for every path P z of Zlr(T) between states C 5] and 8 ) C 82 , where 5] and 82 are 
refinements of CTi and 02 respectively, there is a path P between states 81 and 82 in T/r. 

2. Every path in the transition diagram before zooming should correspond to a path in the zoomed transition 
diagram. In other words, for every path P of Zlr, formed by actions of Zlr(T), between states <5] and 82 that are 
refinements of <J\ and (72 respectively, there is a path P z of z u fT) between states C <5i and C 82 - 

3. Paths in Zlr(T) should be of sufficiently high probability for the probabilistic solver to find them. 

To construct such a zoomed system description 2>lr{T) defining transition diagram T/r ( T), we begin by defining 
relObConuiT), the collection of object constants of signature £// of &n relevant to transition T. 

Definition 9. [Constants relevant to a transition] 

For any given (ground) transition T = (c 7 i ,a H , of) of T//, by relObConuiT) we denote the minimal set of object 
constants of signature E h of S(// closed under the following rules: 

1. Object constants from a H are in relObConuiT)', 

2. If f{x 1 ,... ,x„) = y belongs to < 7 i or ( 72 , but not both, then x\,... ,x n ,y are in relObConuiT)', 

3. If the body B of an executability condition of ci H contains an occurrence of a term f(x\ ,..., x n ) and f(x\ ,...,x n ) = 
y € CTi then xi,...,x„,y are in relObConuiT). 

Constants from relObConuiT) are said to be relevant to T. In the context of Example [4] consider transition T = 

(0 \,grasp(rob\,cup\),C2) such that loc{rob\) = kitchen and loc{cup\) = kitchen are in 0 \. Then, relObConniT) 
consists of rob\ of sort robot and cup\ of sort kitchenware (based on the first rule above), and kitchen of sort place 
(based on the third rule above and fourth axiom in Statement[l9]in Example[4|. For more details, see Example[ 6 ] 

Now we are ready for the first step of the construction of @lr(T). Object constants of the signature E h(T) of the 
new system description $>h(T) are those of relObConuiT). Basic sorts of E//(7\) are non-empty intersections of 
basic sorts of E// with relObConuiT). The domain properties and actions of E//('T) are those of E// restricted to the 
basic sorts of E h{T), and the axioms of £>h(T) are restrictions of axioms of @>h to E h(T). It is easy to show that the 
system descriptions £% and S(//(7) satisfy the following requirement—for any transition T = (< 7 i, a 11 . of) of transition 
diagram T// corresponding to system description there exists a transition (cTj (T),a H , ( 72 (T)) in transition diagram 
z h {T) corresponding to system description 5% (T), where d\ (T) and 02 (T) are obtained by restricting cti and (72 
(respectively) to the signature E h{T). 

In the second step, the zoomed system description 2>lr(T) is constructed by refining the system description 3>h(T). 
Unlike the description of refinement in Section |TTj which requires the designer to supply domain-specific information, 
we do not need any additional input from the designer for refining ( T) and can automate the entire zoom operation. 
We now provide a formal definition of the zoomed system description. 
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Definition 10. [Zoomed system description] 

For a coarse-resolution transition T, 3>lr{T) with signature E lr{T) is said to be the zoomed fine-resolution system 
description if: 

1. Basic sorts of T.[r(T) are those of @LR that are refined counterparts of the basic sorts of &h( T). 


2. Functions of E^(r) are those of ^lr restricted to the basic sorts of T.lr(T). 

3. Actions of E lr{T) are those of SfitR restricted to the basic sorts of E^r (T). 

4. Axioms of S>lr{T) are those of [Zlr restricted to the signature T.lr{T). 


Consider T = {o\,move{rob\,kitchen), of) such that loc{rob \) = office £ Of. The basic sorts of E lr(T) include 
robot} = {rob 1 }, place}} = {off ice,kitchen} and place z L = {q : c, £ kitchenU office}. The functions of E lr{T) 
include loc*{rob\) taking values from place}}, loc{rob\) taking values from place z L , range{loc{rob\),place}), statics 
next Jo* {place}}, place}}) and next Jo(pl ace), place} ) , properly restricted functions related to testing the values of 
fluent terms etc. The actions include move{rob\,Ci) and test{rob\,loc{rob\),Ci ), where c, are individual elements of 
place z L . Finally, restricting the axioms of to the signature E/^f'Tj removes causal laws for grasp and put down, 
and the first state constraint corresponding to Statement 18 in @[r. Furthermore, in the causal law and executability 
condition for move , we only consider cells in the kitchen or the office. 


Based on Definition [7] of refinement and Proposition [ 2 ] it is easy to show that the system descriptions $>h{T) and 
3>lr(T) satisfy the following requirement—for any transition {o\{T),a H ,02(T)) in transition diagram t h{T) corre¬ 
sponding to system description ^(T), where G\ (T) and 02(T) are obtained by restricting states tTj and Ob (respec¬ 
tively) of &h to signature E//(T), there exists a path in T lr(T) between every refinement of c>\ (T) and a refinement 
8 } of OijT). We now provide two examples of constructing the zoomed system description. 


Example 5. [First example of zoom] 

As an illustrative example, consider the transition T = (<7| .move(rob\ .kitchen), of) such that loc(rob \) = office £ <Ti. 
In addition to the description in Example[4] assume that the domain includes (a) boolean fluent broken(robot); and (b) 
fluent color(robot) taking a value from a set of colors—there is also an executability condition: 


impossible move(Rb,Pl) if broken(Rb) 


Intuitively, color(Rb) and broken(Rb), where Rb f rob\, are not relevant to a H , but broken ( rob\ ) is relevant. Specifi¬ 
cally, based on Definition[9] relObConn(T) consists of rob \ of sort robot, and {kitchen,off ice} of sort place —basic 
sorts of E h(T) are intersections of these sorts with those of E//. The domain properties and actions of signature E h(T) 
are restricted to these basic sorts, and axioms of fZu(J) are those of ( Zu restricted to E h(T), e.g., they only include 
suitably ground instances of the first axiom in Statement [T7J the second axiom in Statement [18] and the first three 
axioms in Statement [l9l 

Now, the signature E lr{T) of the zoomed system description 3>lr{T) has the following: 

• Basic sorts robot £ = {rob 1 }, placed = {office, kitchen} and place z L = {c,-: c; £ kit chenU off ice}. 

• Functions that include (a) fluents loc(robotf ) and loc* (robotff) that take values from place} and place}} respec¬ 
tively, and range{loc{robot}),place}); (b) static relations next Jo* {pi ace}}, place}}) and next Jo{place z L , place})', 
(c) broken(robot})\ (d) knowledge fluents, e.g., directly .observed {robot},loc{robot}), place}), etc. 

• Actions that include (a) move{robot}, place})', and (b) test {robot}, loc{robot}), place}). 


The axioms of S’lr}!') are those of restricted to E lr{T), e.g., they include: 

move{rob\,Cj) causes loc{rob\) = {C: range{loc{rob\),C)} 
test{rob\,loc{rob\),Cj) causes directly_observed{rob\,loc{rob\),cj) = {true,false} if loc{robi) = cj 
impossible move{rob\,Cj) if loc{rob\) = c,, nextJo{cj,Ci ) 
impossible move{rob\,Cj) if broken{rob\) 
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where range(loc(rob\),C) may hold for C £ {ci,Cj,Ck}, which are within the range of the robot’s current location 
(ci), and are elements of place z L . Assuming the robot is not broken, each state of z lr(T) thus includes an atom of the 
form loc(rob\) = Ci, where c,- is a cell in the kitchen or the office, ^broken (rob \), direct observations of this atom, 
e.g., directly-observed(rob\foc(rob]),ci) =true, and statics such as nextJo(ci,cf) etc. Specific actions include 
move(rob\,Ci) and test(rob\,loc(rob\),Ci). 

As an extension to this example, if rob\ is holding textbook tb\ before executing a H = move (rob \ .kitchen), i.e., 
inJmnd(rob\,tb\) £ G\, then E//(r) also includes tb\ of sort textbook, and E lr{T) includes object} = {tb\}. The 
functions of @>lr(T) include basic fluent in hand (robot fob ject}) and the corresponding knowledge fluents, and the 
actions and axioms are suitably restricted. 


Example 6. [Second example of zoom] 

Consider the transition T = ( O\,grasp(rob\,cup \),02 ) such that loc(rob\) = kitchen is in <Ti. Note that this example 
does not have the additional fluents (e.g., broken) considered in Example[5] Based on Definition [9] relObConn(T) 
consists of rob\ of sort robot and cup\ of sort kitchenware, and kitchen of sort place —signature E h{T) and system 
description 2>h(T) are constmcted in a manner similar to that described in Example[5] 

Now, the signature E lr(T) of the zoomed system description @lr(T) has the following: 

• Basic sorts robot} = {rob i}, place}] = {kitchen}, place] = { c,: Cj £ kitchen}, and object} = {cup\}. 

• Functions that include (a) basic non-knowledge fluents loc(robot} ) and locf objectf) that take values from 
place], and / oc* (robot]) and Ioc* (object]) that take values from place}]', (b) fluent range(loc(robot]),place})', 

(c) static relations next Jo*(place]],place}]) and next Jo(place],place}); (d) knowledge fluents restricted to the 
basic sorts and fluents, etc. 

• Actions such as (a) move(robot].place}])', (b) grasp(robot], object})', (c) putdown(robot],object})', (d) knowledge- 
producing actions test (robot},loc(robot}]), place}) and test(robot},loc(object}),place}), etc. 


The axioms of 3>lr(T) are those of &u< restricted to the signature T,lr(T). These axioms include: 
move(rob\,Cj) causes loc(rob\) = {C: range(loc(rob\),C)} 
grasp(rob\,cup\) causes inJiand(robi,cupi) = {true, false} 
test(rob\,loc(rob\),Cj) causes directly.observed(rob\,loc(rob\),cj) = {true,false} if loc(rob\) = Cj 
test(rob\,loc(cupi),Cj) causes directlyjobserved(rob\,loc(cup\),Cj) = {true,false} if loc(cupf) = cj 
impossible move(rob\,Cj) if loc(rob\) =c,, —next toicpc,) 
impossible grasp(rob\,cup\) if loc(rob\) = Ci, loc(cup\) =Cj, Cif^Cj 

where range(loc(rob\) ,C) may hold for C £ { c t , cj , q }, which are within the range of the robot’s current location (c,), 
and are elements of place}. The states of Zrr(T) thus include atoms of the form loc(rob\) = Cj and loc(cup\) = c ; , 
where c, and Cj are values in place}, inJiand(rob\,cup\), observations, e.g., directly_observed(rob\ ,loc(rob\),c[) = 
true, statics such as nextJo(ci,Cj), etc. Specific actions include move(robi,ci), gras p(rob\, cup \), putdown(robi,cup{), 
test(rob\,loc(rob\),Ci) and test(rob\,loc(cup\),Ci). 

In Examples [5] and [6] the statistics collected earlier (Section \1 .2\ can be used to assign probabilities to the outcomes 
of actions, e.g., if action move(rob\,c\) is executed, the probabilities of the outcomes may be: 


P(loc(rob\) = ci) = 0.85 


0.15 


P(loc(rob i) = Cl | range(loc(rob\),Cl),Cl ^ ci) = jpzrr'-’ where Cl = {Cl : range(loc(rob\),Cl), Cl ci} 

\Cl | 

Similarly, if the robot has to search for a textbook cup\ once it reaches the kitchen, and if a test action is executed to 
determine the location of a textbook cup\ in cell q in the kitchen, the probabilities of the outcomes may be: 

P (directly _observed(rob\,loc(cup\,Ci) = true loc(cup\) = Cj'j = 0.9 

P (directly jobserved(rob\,loc(cupi),Ci) = false loc(cup\) = c,J =0.1 
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Given @>lr{T) and the probabilistic information, the robot now has to execute a sequence of concrete actions that 
implement the desired transition T = (o\ ,a H , Oi). For instance, a robot searching for cup\ in the kitchen can check 
cells in the kitchen for cup\ until either the cell location of cup\ is determined with high probability (e.g., > 0.9), 
or all cells are examined without locating cup\. In the former case, the probabilistic belief can be elevated to a 
fully certain statement, and the robot reasons about the action outcome and observations to infer that cup\ is in the 
kitchen, whereas the robot infers that cup\ is not in the kitchen in the latter case. Such a probabilistic implementation 
of an abstract action as a sequence of concrete actions is accomplished by constructing and solving a POMDP, and 
repeatedly invoking the corresponding policy to choose actions until termination, as described below. 


8 POMDP Construction and Probabilistic Execution 


In this section, we describe the constmction of a POMDP ^(T) as a representation of the zoomed system description 
Dm(T) and the learned probabilities of action outcomes (Section [7. 2\ , and the use of : : 3^(T) for the fine-resolution 
implementation of transition T = (o\,a H , 02 } of in- First, Section [8Tl] summarizes the use of a POMDP to com pute 
a policy for selecting one or more concrete actions that implement any given abstract action a H . Section 
describes the steps of the POMDP construction in more detail. 


5.2 


then 


8.1 POMDP overview 

A POMDP is described by a tuple (A p ,S P ,b p ,Z p ,T P ,O p ,R P ) for specific goal state(s). This formulation of a POMDP 
builds on the standard formulation (27l . and the tuple’s elements are: 

• A p : set of concrete actions available to the robot. 

• S p : set of p-states to be considered for probabilistic implementation of a H . A p-state is a projection of states 
of 3!lr{T) on the set of atoms of the form /(f) = y, where /(f) is a basic non-knowledge fine-resolution fluent 
term, or a special p-state called the terminal p-state. We use the term “p-state” to differentiate between the states 
represented by the POMDP and the definition of state we use in this paper. 

• b p : initial belief state, where a belief state is a probability distribution over S p . 

• Z p : set of observations. An observation is a projection of states of £>lr{T) on the set of atoms of basic knowl¬ 
edge fluent terms corresponding to the robot’s observation of the value of a fine-resolution fluent term, e.g., 
directly_observed(robot, f (f) ,y) = outcome, where y is a possible outcome of the fluent term /(f). For simplic¬ 
ity, we use the observation none to replace all instances that have undet as the outcome. 

• T p :S P xA p xS p -» [0,1], the transition function, which defines the probability of transitioning to each p-state 
when particular actions are executed in particular p-states. 

• O p : S p x A p x Z p —► [0,1], the observation function, which defines the probability of each observation in Z p 
when particular actions are executed in particular p-states. 

• R p : S p x A p x S p —► 9/ the reward specification, which encodes the relative immediate reward of taking specific 
actions in specific p-states. 

The p-states are considered to be partially observable because they cannot be observed with complete certainty, and 
the POMDP reasons with probability distributions over the p-states, called belief states. Note that this formulation 
is only based on a system description and does not include any history of observations and actions—in a standard 
POMDP formulation, the current p-state is assumed to be the result of all information obtained in previous time steps, 
i.e., the p-state is assumed to implicitly include the history of observations and actions. 

The use of a POMDP has two phases (1) policy computation; and (2) policy execution. The first phase computes 
policy n p : B p —>■ A p that maps belief states to actions, using an algorithm that maximizes the utility (i.e., expected 
cumulative discounted reward) over a planning horizon—we use a point-based approximate solver f37). In the second 
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phase, the computed policy is used to repeatedly choose an action in the current belief state, updating the belief state 
after executing the action and receiving an observation. Belief revision is based on Bayesian updates: 


&f+i Of+i ) 0{s p + 1 , af+t,of +1 ) ^ T(if,af +1 ,if +1 ) • b p (if) 


(27) 


where b p +l is the belief state at time t + 1. The belief update continues until policy execution is terminated. In our 
case, policy execution terminates when doing so has a higher (expected) utility than continuing to execute the policy. 
This happens when either the belief in a specific p-state is very high (e.g., > 0.8), or none of the p-states have a 
high probability associated with them after invoking the policy several times—the latter case is interpreted as the 
failure to execute the coarse-resolution action under consideration. We will use “POMDP-1” to refer to the process of 
constructing a POMDP, computing the policy, and using this policy to implement the desired abstract action. 


8.2 POMDP construction 


Next, we describe the construction of POMDP fiZ{T) for the fine-resolution probabilistic imp leme ntation of transition 
T = ( Oi,a H ,02 ) £ T//, using 2 >lr{T) and the statistics collected in the training phase (Section |7.2[ ). We illustrate these 
steps using examples based on the domain described in Example [T| including the example described in Appendix |C| 

Actions: the set A p of actions of SZ{T) consists of concrete actions from the signature of 2>lr{T) and new terminal 
actions that terminate policy execution. We use a single terminal action—if A p is to include domain-specific terminal 
actions, it is the designer’s responsibility to specify them. For the discussion below, it will be useful to partition A 75 into 
three subsets (1) A p , actions that cause a change in the p-states; (2) A p , knowledge-producing actions for testing the 
values of fluents; and (3) A p , terminal actions that terminate policy execution. The example in Appendix |c| includes 
(a) actions from A p that move the robot to specific cells, e.g., move -0 and move -1 cause robot to move to cell 0 and 
1 respectively, and the grasp action; (b) test actions from A p to check if the robot or target object (textbook) are in 
specific cells; and (c) action finish from Aj that terminates policy execution. 

P-states, initial belief state and observations: the following steps are used to construct S p , Z p and b p . 


1. Constru ct A SP program U c {^lr{T)) U 8 nd . Here, FT c (@lr(T)) is constructed as described in Definition [I] 
(Section 5.1 1 , and 8" d is a collection of (a) atoms formed by statics; and (b) disjunctions of atoms formed by 
basic fluent terms. Each disjunction is of the form {/(f) =y\ V ... V fit ) = y„}, where {yi,---,y n } are 
possible values of basic fluent term /(f). Observe that AS is an answer set of Yl c {S>ui{T)) U 8 nd iff it is an 
answer set of n c (^ /? (7’)) U g for some g £ G, where G is the collection of sets of literals obtained by assigning 
unique values to each basic fluent term /(f) in 8 nd . This statement follows from the definition of answer set and 
the splitting set theorem. 


2. Compute answer set(s) of ASP program T\. c {$)lr{T)) U 8" d . Based on the observation in Step-1 above, and the 
well-foundedness of IZufiT), it is easy to show that each answer set is unique and is a state of ®lr{T). 


3. From each answer set, extract atoms of the form /(f) = y, where /(f) is a basic non-knowledge fine-resolution 
fluent term, to obtain an element of S p . Basic fluent terms corresponding to a coarse-resolution domain property, 
e.g., room location of the robot, are not represented probabilistically and thus not included in S p . We refer to 
such a projection of a state 8 of %>lr{T) as the p-state defined by 8. Also include in S p an “absorbing” terminal 
p-state absb that is reached when a terminal action from A 3 is executed. 


4. From each answer set, extract atoms formed by basic knowledge fluent terms corresponding to the robot sensing 
a fine-resolution fluent term’s value, to obtain elements of Z p , e.g., directly .observed {robot ,f(t),y) = outcome. 
We refer to such a projection of a state 8 of IZufiT) as an observation defined by 8. As described earlier, for 
simplicity, observation none replaces all instances in Z p that have undet as the outcome. 

5. If all p-states are equally likely, the initial belief state b p is a uniform distribution. If some other distribution is to 
be used as the initial belief state, this information has to be provided by the designer. For instance, the designer 
may distribute (1 — e), where e is a small number such as 0 . 1 , over p-states known to be more likely a priori, 
and distribute e over the remaining p-states. 
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In the example in Appendix [c] abstract action grasp[rob\ ,th\) has to be executed in the office. To do so, the robot 
has to move and find tb\ in the office. Example Ihl shows the corresponding (T), and 8 nd includes (a) atoms 
formed by statics, e.g., next Jo(c \, cf) where <?i and C 2 are neighboring cells in the office; and (b) disjunctions such as 
{loc(robi) = ci V ... V loc(robi) = c n } and {loc{tb\) = c\ V ... V loc(tb\) = c„}, where {ci,... ,c„} £ office. 
In Step 3, p-states such as {loc(rob\) =c \, loc(tb\) =c i, -i in_liand(robi,c \)} are extracted from the answer sets. In 
Step 4, observations such as directly_observed[rob\ foc(rob\),c\) = true and directly_observed(rob\ : loc{tb\),c\) = 
false are extracted from the answer sets. Finally, the initial belief state b p is set as a uniform distribution (Step 5). 


Transition function and observation function: a transition between p-states of &{T) is defined as ( Si,a,Sj ) £ T p 
iff there is an action a £ A p and a transition (8 x .a. 8 y ) of @lr{T) such that Sj and sj are p-states defined by 8 X and S y 
respectively. The probability of (si,a,Sj) £ 'I 1 ' equals the probability of (8 x .a.8 } '). In a similar manner, (sj,a,Zj) £ O p 
iff there is an action a £A P and a transition (8 x .a, 8 y ) of 2>lr(T) such that s, and Zj are a p-state and an observation 
defined by 8 X and 8 y respectively. The probability of (,s,.a.Zj) £ O p equals the probability of (8 x .a. 8 y ). 

We construct T p and O p from S>lr(T) and the statistics collected in the initial training phase (as described in 
Section 7.2 1 . First, we augment S>lr(T) with causal laws for proper termination: 


finish causes absb 
impossible A p if absb 


Next, we note that actions in A p cause p-state transitions but provide no observations, while actions in A p do not cause 
p-state changes but provide observations, and terminal actions in A 3 cause transition to the absorbing state and provide 
no observations. To use state of the art POMDP solvers, we need to represent T p and O p as a collection of tables, 
one for each action. More precisely, T p [s,-, sf = p iff (s,,a.Sj) £ T p and its probability is p. In a similar manner, 
O p [si,z.j\ = p iff (sj.a.Zj) £ O p and its probability is p. Algorithm[I]describes the construction of T p and O p . 

Some specific steps of Algorithm [T] are elaborated below. 

• After initialization. Lines 3-12 of Algorithm[I]handle special cases. For instance, any terminal action will cause 
a transition to the terminal p-state and provide no observations (Lines 4-5). 

• An ASP program of the form Yl{S>RR{T),Si 1 Disj{A)) (Lines 12, 15) is defined as I\{S>i,r(T)) U val(si,0) U 
Disj(A). Here, Dis j(A ) is a disjunction of the form {occurs{a\ 1 Q) V .. .Voccurs(a, tl 0)}, where {ai,... ,a n } £ A. 
Lines 14-16 construct and compute answer sets of such a program to identify all possible p-state transitions as 
a result of actions in A p —Lines 17-19 construct and compute answer set of such a program to identify possible 
observations as a result of actions in A p . 

• Line 16 extracts a statement of the form occurs(a/ c £ Af ,0), and p-state sj £ S p , from each answer set AS. to 
obtain p-state transition (si,ak,sj). As stated earlier, a p-state is extracted from an answer set by extracting 
atoms formed by basic non-knowledge fluent terms. 

• Line 19 extracts a statement of the form occurs(aj £ A p . 0), and observation Zj £ Z p . from each answer set ,4.S', 
to obtain triple (si^kjZj). As described earlier, an observation is extracted from an answer set by extracting 
atoms formed by basic knowledge fluent terms. 

• Use probabilities obtained experimentally (Section [7.2| i to set probabilities of p-state transitions (Line 16) and 
observations (Line 19). 

In the example in Appendix [Cj a robot in the office has to pick up a textbook tb\ believed to be in the office. 
This example assumes that a move action from one cell to a neighboring cell succeeds with probability 0.95—with 
probability 0.05 the robot remains in its current cell. It is also assumed that with probability 0.95 the robot observes 
(does not observe) the textbook when it exists (does not exist) in the cell the robot is currently in. The corresponding 
T p and O p , constructed for this example, are shown in Appendix |c| 

The correctness of the approach used to extract p-state transitions and observations, in Lines 16, 19 of Algorithm[l] is 
based on the following propositions. 
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Algorithm 1: Constructing POMDP transition function T p and observation function O p 

Input: S p , A p , Z p , ^lr(T)\ transition probabilities for actions £ A p \ observation probabilities for actions £ A p . 
Output: POMDP transition function T p and observation function O p . 

1 Initialize T p as \S P \ x \S P \ identity matrix for each action. 

2 Initialize O p as \S P \ x \Z P \ matrix of zeros for each action. 

/* * Handle special cases */ 

3 for each £ A p do 

4 T p (*,absb) = l 

5 O p .(* 1 none) = l 

6 end 

7 for each action aj £ A p do 

8 | O p .(* 1 none) = \ 

9 end 

to for each aj £ A p do 
ri | O p .(absbpione) = 1 

u end 


/* Handle normal transitions */ 
t3 for each p-state Si £ S p do 
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45 

46 


/* Construct and set probabilities of p-state transitions */ 

Construct ASP program Yl(£>RR(T),Si,Disj(A p )). 

Compute answer sets AS of ASP program. 

From each AS £ AS, extract p-state transition ( Sj,ak,Sj ), and set the probability of T P k [sj,Sj\. 


/* Construct and set probabilities of observations */ 

47 Construct ASP program H(@LR(T),Si,Dis j(A p )). 

48 Compute answer sets AS of ASP program. 

49 From each AS £ AS, extract triple (s^a^zy), and set value of 0 P k [,v,-, zj]■ 

20 end 

2 t return T p and O p . 


Proposition 3. [Extracting p-state transitions from answer sets] 

• If (si,a, 3 j) £ T p then there is an answer set AS of program Yl(^i J R(T),Si,Disj(A p )) such that Sj = {/(x) =y : 
f{x)=y£AS and is basic}. 

• For every answer set AS of program TI(S>lr (T),Sj,Dis j(A p )) and sj = {/(Jc) =y : f(x) =y £ AS and is basic}, 
(si,a,Sj) £ T p . 

Proposition 4. [Extracting observations from answer sets] 

• If ( Si,a,Zj) £ O p then there is an answer set AS of program Yl(^LR(T),Si,Disj(A p )) such that Zj = {/(x) =y : 
f{x)=y£AS and is basic}. 

• For every answer set AS of program TI(S>lr (T),Sj,Dis j(A p )) and Zj = {/(x) =y : /(x) = y £ AS and is basic}, 
(si,a,Zj) £ O p . 

It is possible to show that these propositions are true based on the definition of an answer set, the definition of the 
zoomed system description 3>lr{T), and the definition of the POMDP components. 

Reward specification: the reward function R p assigns a real-valued reward to each p-state transition, as described in 
Algorithmic Specifically, for any state transition with a non-zero probability in T p : 
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Algorithm 2: Construction of POMDP reward function R p 

Input: S p , A p , and T p \ statistics regarding accuracy and time taken to execute non-terminal actions. 
Output: Reward function R p . 


/* Consider each possible p-state transition */ 

1 for each ( s,a,s') G S p xA p x S p with T p (s,a 1 s') / 0 do 

/* Consider terminal actions first */ 

2 if a € A p then 

3 if s' is a goal p-state then 

4 | R p (s,a,s') = large positive value. 

5 else 

6 | R p (s, a, s') = large negative value. 

7 end 


/* Rewards are costs for non-terminal actions */ 

8 else 

9 | Set R p (s,a,s') based on relative computational effort and accuracy. 

to end 

n end 
12 return R p 


1. If it involves a terminal action from A 3 , the reward is a large positive (negative) value if this action is chosen 
after (before) achieving the goal p-state. 

2. If it involves non-terminal actions, reward is a real-valued cost (i.e., negative reward) of action execution. 

Here, any p-state s G S p defined by state 5 of 2>lr(T) that is a refinement of 02 in transition T = (o\ ,a u . of) is 
a goal p-state. In Appendix [C| we assign large positive reward (100) for executing finish when textbook tb\ is in 
the robot’s grasp, and large negative reward (—100) for terminating before th\ has been grasped. We assign a fixed 
cost (—1) for all other (i.e., non-terminal) actions. However, as stated earlier, this cost can be a heuristic function 
of both relative computational effort and accuracy, using domain expertise and statistics collected experimentally. 
For instance, statistics may indicate that a knowledge-producing action that determines an object’s color takes twice 
as much time as the action that determines the object’s shape, which can be used to set R p (*, shape,*) = —1 and 
R p {*,color,*) = —2. The reward function, in turn, influences the (a) rate of convergence during policy computation; 
and (b) accuracy of results during policy execution. Appendix [Cjdescribes the reward function for a specific example. 

Computational efficiency: Solving POMDPs can be computationally expensive, even with state of the art approxi¬ 
mate solvers. For specific tasks such as path planning, it may also be possible to use specific heuristic or probabilistic 
algorithms that are more computationally efficient than a POMDP. However, POMDPs provide a (a) principled and 
quantifiable trade-off between accuracy and computational efficiency in the presence of uncertainty in both sensing 
and actuation; and (b) near-optimal solution if the POMDP’s components are modeled correctly. With a POMDP, 
the computational efficiency can be improved further by “factoring” the p-state estimation problem into sub-problems 
that model actions and observations influencing one fluent independent of those influencing other fluents. Such a 
factoring of the problem is not always possible, e.g., when a robot is holding a textbook in hand, the robot’s location 
and the textbook’s location are not independent. Instead, in our architecture, we preserve such constraints while still 
constructing a POMDP for the relevant part of the domain to significantly reduce the computational complexity of 
solving the POMDP. Furthermore, many of the POMDPs required for a given domain can be precomputed, solved 
and reused. For instance, if the robot has constructed a POMDP for the task of locating a textbook in a room, the 
POMDP for locating a different book (or even a different object) in the same room may only differ in the values of 
some transition probabilities, observation probabilities, and rewards. Note that this similarity between tasks may not 
hold in non-stationary domains, in which the elements of the POMDP tuple (e.g., set of p-states), and the collected 
statistics (e.g., transition probabilities), may need to be revised over time. 
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Computational error: Although the outcomes of POMDP policy execution are non-deterministic, following an opti¬ 
mal policy produced by an exact POMDP solver is most likely (among all such possible policies) to take the robot to 
a goal p-state if the following conditions hold: 

• The coarse-resolution transition diagram t h of the domain has been constructed correctly; 

• The statistics collected in the initial training phase (Section [772} correctly model the domain dynamics; and 

• The reward function is constructed to suitably reward desired behavior. 

This statement is based on existing literature 071 [35l f43l . We use an approximate POMDP solver for computational 
efficiency, and an exact belief update (Equation|27|i, which provides a bound on the regret (i.e., loss in value) achieved 
by following the computed policy in comparison with the optimal policy |37 |. We can thus only claim that the outcomes 
of executing of our policy are approximately correct with high probability. We can also provide a bound on the margin 
of error EO. For instance, if the probability associated with a statement in the fine-resolution representation is p, 
the margin of error in a commitment made to the history (in the coarse-resolution representation) based on this 
statement is (1 — p). If a set of statements with probabilities pt are used to arrive at a conclusion that is committed to 
3#?, (1 — III Pi) is the corresponding error. 


1. collect statistics of action outcomes 


in initial training phase 
2. perform refinement and zoom for T, 
construct POMDP and compute policy 



uses non-monotonic logical inference select and execute concrete actions, 

to compute next abstract action to execute perform probabilistic belief update 

in current state to achieve the goal until policy termination 


Figure 4: The proposed architecture can be viewed as a logician and a statistician communicating through a controller. 
The architecture combines the complementary strengths of declarative programming and probabilistic models. 


9 Reasoning System and Control Loop of Architecture 

Next, we give a more detailed description of the reasoning system and control loop of our architecture for building 
intelligent robots. For this description, we (once again) view a robot as consisting of a logician and a statistician, who 
communicate through a controller, as described in Section[T|and shown in Figure[4] For any given goal, the logician 
takes as input the system description £that corresponds to a coarse-resolution transition diagram T//, recorded history 
M with initial state defaults (see Example [2]), and the current coarse-resolution state <Ti (potentially inferred from 
observations). If recent recorded observations differ from the logician’s predictions, the discrepancies are diagnosed 
and a plan comprising one or more abstract actions is computed to achieve the goal. Planning and diagnostics are 
reduced to computing answer sets of the CR-Prolog program Id(£?//..For a given goal, the controller uses the 
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transition T corresponding to the next abstract action a H in the computed plan to zoom to 2>lr(T), the part of the 
randomized fine-resolution system description that is relevant to the T. A POMDP is then constructed from 
@lr{T) and the learned probabilities, and solved to obtain a policy. The POMDP and the policy are communicated 
to the statistician who invokes the policy repeatedly to implement the abstract action a H as a sequence of (more) 
concrete actions. When the POMDP policy is terminated, the corresponding observations are sent to the controller. 
The controller performs inference in @lr{T), recording the corresponding coarse-resolution action outcomes and 
observations in the coarse-resolution history which is used by the logician for subsequent reasoning. 


Algorithm 3: Control loop 

Input: coarse-resolution system description &n and history M \ randomized fine-resolution system description 
3>lr\ coarse-resolution description of the goal; coarse-resolution initial state d\. 

Output: robot is in a state satisfying the goal; reports failure if this is impossible. 


t while goal is not achieved do 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 


Logician uses 9>h and Jf? to find a possible plan, a^,... . a'J to achieve the goal, 
if no plan exists then 

j return failure 
end 
i := 1 

continuel := true 
while continuel do 

Check pre-requisites of af. 
if pre-requisites not satisfied then 
| continuel := false 
else 

Controller zooms to S>lr{T), the part of relevant to transition T = (c>\ ,af 
POMDP. 


, (7i) and constructs a 
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23 

24 

25 

26 

27 

28 end 


end 


end 


Controller solves POMDP to compute a policy to implement a^ * 1 2 . 
continue2 := true 
while continue2 do 

Statistician invokes policy to select and execute an action, obtain observation, and update belief 
state. 

if terminal action executed then 

Statistician communicates observations to the controller. 
continue2 = false 
i := i+1 

continuel := (i < n + 1) 

end 

Controller performs fine-resolution inference, recording action outcomes and observations in Jif. 
cri = current coarse-resolution state. 


Algorithm [3] describes the overall control loop for achieving the assigned goal. Correctness of this algorithm (with a 
certain margin of error) is ensured by: 

1. Applying the planning and diagnostics algorithm discussed in Section [572| for planning with T// and 

2. Using the formal definitions of refinement and zoom described in Section|7j and 
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3. Using a POMDP to probabilistically plan an action sequence and executing it for each a H of the logician’s plan, 
as discussed in Section [8] 


The probabilistic planning is also supported by probabilistic state estimation algorithms that process inputs from 
sensors and actuators. For instance, the robot builds a map of the domain and estimates its position in the map using a 
Particle Filter algorithm for Simultaneous Localization and Mapping (SLAM) ||49| . This algorithm represents the true 
underlying probability distribution over the possible states using samples drawn from a proposal distribution. Samples 
more likely to represent the true state, determined based on the degree of match between the expected and actual sensor 
observations of domain landmarks, are assigned higher (relative) weights and re-sampled to incrementally converge 
to the true distribution. Implementations of the particle filtering algorithm are used widely in the robotics literature to 
track multiple hypotheses of system state. A similar algorithm is used to estimate the pose of the robot’s arm. On the 
physical robot, other algorithms used to process specific sensor inputs. For instance, we use existing implementations 
of algorithms to process camera images, which are the primary source of information to identify specific domain 
objects. The robot also uses an existing implementation of a SLAM algorithm to build a domain map and localize 
itself in the map. These algorithms are summarized in Section 10 when we discuss experiments on physical robots. 


10 Experimental Setup and Results 

This section describes the experimental setup and results of evaluating the architecture’s capabilities. 


10.1 Experimental setup 

The proposed architecture was evaluated in simulation and on a physical robot. As stated in Section [8] statistics of 
action execution, e.g., observed outcomes of all actions and computation time for knowledge producing actions, are 
collected in an initial training phase. These statistics are used by the controller to compute the relative utility of 
different actions, and the probabilities of obtaining different action outcomes and observations. The simulator uses 
these statistics to simulate the robot’s movement and perception. In addition, the simulator represents objects using 
probabilistic functions of features extracted from images, with the corresponding models being acquired in an initial 
training phase—see ED for more details about such models. 

In each experimental trial, the robot’s goal was to find and move specific objects to specific places—the robot’s 
location, the target object, and locations of domain objects were chosen randomly. An action sequence extracted from 
an answer set of the ASP program provides a plan comprising abstract actions, each of which is executed proba¬ 
bilistically. Our proposed architecture, henceforth referred to as “PA”, was compared with: (1) POMDP-1; and (2) 
POMDP-2, which revises POMDP-1 by assigning specific probability values to default statements to bias the initial 
belief. The performance measures were: (a) success , the fraction (or %) of trials in which the robot achieved the 
assigned goals; (b) planning time, the time taken to compute a plan to achieve the assigned goal; and (c) the average 
number of actions that were executed to achieve the desired goal. We evaluate the following three key hypotheses: 

HI PA simplifies design in comparison with architectures based on purely probabilistic reasoning and increases 
confidence in the correctness of the robot’s behavior; 

H2 PA achieves the assigned goals more reliably and efficiently than POMDP-1; and 

H3 Our representation for defaults improves reliability and efficiency in comparison with not using defaults or 
assigning specific probability values to defaults. 


We examine the first hypothesis qualitatively in the context of some execution traces grounded in the illustrative do¬ 
main described in Example[T|(Section |10.2| >. We then discuss the quantitative results corresponding to the experimental 
evaluation of the other two hypotheses in simulation and on physical robots (Section 10.3|>. 


10.2 Execution traces 

The following (example) execution traces illustrate some of the key capabilities of the proposed architecture. 


33 








Execution Example 1. [Planning with default knowledge] 

Consider the scenario in which a robot is assisting with a meeting in the office , i.e., loc{rob\, office), and is assigned 

a goal state that contains: 

loc(cup\,office) 

where the robot’s goal is to move coffee cup cup\ to the office. 

• The plan of abstract actions, as created by the logician, is: 

move{rob\, kitchen) 
grasp{rob\ ,cup \) 
move{rob \, office) 
putdown{rob\,cup\) 

Note that this plan uses initial state default knowledge that kitchenware are usually found in the kitchen. Each 
abstract action in this plan is executed by computing and executing a sequence of concrete actions. 

• To implement move {rob {.kitchen), the controller constructs $>lr{T) by zooming to the part of ®LR relevant to 
this action. For instance, only cells in the kitchen and the office are possible locations of rob\, and move is the 
only action that can change the physical state, in the fine-resolution representation. 

• S>lr{T) is used to construct and solve a POMDP to obtain an action selection policy, which is provided to the 
statistician. The statistician repeatedly invokes this policy to select actions (until a terminal action is selected) 
that are executed by the robot. In the context of Figure [3(b)l assume that the robot moved from cell c \ £ office 
to c 5 £ kitchen (through cell C 2 £ office) with high probability. 

• The direct observation from the POMDP, direct ly-observed{rob\,loc{rob\),cf) = true, is used by the con¬ 
troller for inference in @lr{T) and 9>l, e.g., to produce observed (rob |, locfrob |), kitchen). The controller adds 
this information to the coarse-resolution history of the logician, e.g., obs{loc{rob\) = kitchen, 1). Since 
the first abstract action has had the expected outcome, the logician sends the next abstract action in the plan, 
grasp(rob\ ,cup\ ) to the controller for implementation. 

• A similar sequence of steps is performed for each abstract action in the plan, e.g., to grasp cup i, the robot locates 
the coffee cup in the kitchen and then picks it up. Subsequent actions cause rob\ to move cup\ to the office, 
and put cup\ down to achieve the assigned goal. 

Execution Example 2. [Planning with unexpected failure] 

Consider the scenario in which a robot in the office is assigned the goal of fetching textbook tb\, i.e., the initial state 

includes loc{rob\,office), and the goal state includes: 

loc(tb\,office) 

The coarse-resolution system description Oh and history Jrff, along with the goal, are passed on to the logician. 

• The plan of abstract actions, as created by the logician, is: 

move {rob i, main-library) 
grasp{rob\,tb\) 
move {rob \, o ffice) 
putdown{rob\ ,tb\) 

This plan uses the initial state default knowledge that textbooks are typically in the mainJibrary (Statement]!]!. 
Each abstract action in this plan is executed by computing and executing a sequence of concrete actions. 
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• Assume that loc{rob\ .main .library ), i.e., that the robot is in the main .library after successfully executing the 
first abstract action. To execute the grasp [rob\,tb{) action, the controller constructs 2>lr(T) by zooming to the 
part of relevant to this action. For instance, only cells in the main-library are possible locations of rob\ and 
tb\ in the fine-resolution representation. 


S>lr{T) is used to construct and solve a POMDP to obtain a action selection policy, which is provided to the 
statistician. The statistician repeatedly invokes this policy to select actions (until a terminal action is selected) 
that are executed by the robot. In the context of Figure 3(b)| if r 2 is the main-library, the robot may move to 
and search for tb\ in each cell in n, starting from its current location. 


• The robot unfortunately does not find tb\ in any of the cells in the mainJibrary in the second step. These ob¬ 

servations from the POMDP, i.e., directly jobserved(rob\,loc(tb\) ,cf) = false for each c* £ mainJibrary, are 
used by the controller for inference in 3>lr(T) and e.g., to produce observed (robi,loc(tb\),main Jibrary) = 

false. The controller adds this information to the coarse-resolution history of the logician, e.g., obs(loc(tb \) f 
main Jib rary, 2). 

• The inconsistency caused by the observation is resolved by the logician using a CR rule, and the new plan 
is created based on the second initial state default that a textbook not in the mainJibrary is typically in the 
auxJibrary (Statement^: 


move (rob i, auxJ ibrary) 
grasp(robi,tb\) 
move(rob\, o ffice ) 
putdown(rob\,tb\) 


• This time, the robot is able to successfully execute each abstract action in the plan, i.e., it is able to move to the 
auxJibrary , find tb\ and grasp it, move back to the office, and put tb\ down to achieve the assigned goal. 

Both these examples illustrate key advantages provided by the formal definitions, e.g., of the different system descrip¬ 
tions and the tight coupling between the system descriptions, which are part of the proposed architecture: 


1. Once the designer has provided the domain-specific information, e.g., for refinement or computing probabilities 
of action outcomes, no further input is necessary to automate planning, diagnostics, and execution for any given 
goal. 

2. Attention is automatically directed to only the relevant part of the available knowledge at the appropriate reso¬ 
lution. For instance, reasoning by the logician (statistician) is restricted to a coarse-resolution (fine-resolution) 
system description. It is thus easier to understand, and to identify and fix errors in, the observed behavior, in 
comparison with architectures that consider all the available knowledge or only support probabilistic reason¬ 
ing (53]. 


3. There is smooth transfer of control and relevant knowledge between the components of the architecture, and 
confidence in the correctness of the robot’s behavior. Also, the proposed methodology supports the use of 
this architecture on different robots in different domains, e.g.. Section 10.3 describes the result of using this 
architecture on mobile robots in two different indoor domains. 


Next, we describe the experimental evaluation of the hypotheses H2 and H3 in simulation and on a mobile robot. 


10.3 Experimental results 

To evaluate hypothesis H2, we first compared PA with POMDP-1 in a set of trials in which the robot’s initial position 
is known but the position of the object to be moved is unknown. The solver used in POMDP-1 was evaluated with 
different fixed amounts of time for computing action policies. Figure [5] summarizes the results; each point is the 
average of 1000 trials, and we set (for ease of interpretation) each room to have four cells. The brown-colored plots 
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Figure 5: Ability to successfully achieve the assigned goal, and the number of actions executed before termination, as 
a function of the number of cells in the domain. PA significantly increases accuracy and reduces the number of actions 
executed, in comparison with POMDP-1, as the number of cells in the domain increases. 
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(a) Using all knowledge (b) Using relevant knowledge (c) Using some knowledge 

Figure 6: Planning time as a function of the number of rooms and the number of objects in the domain —PA only uses 
relevant knowledge for reasoning, and is thus able to scale to larger number of rooms and objects. 


in Figure [5] represent the ability to successfully achieve the assigned goal (y-axis on the left), as a function of the 
number of cells in the domain. The blue-colored plots show the number of actions executed before termination. For 
the plots corresponding to POMDP-1, the number of actions the robot is allowed to execute before it has to terminate 
is set to 50. We note that PA significantly improves the robot’s ability to achieve the assigned goal in comparison 
with POMDP-1. As the number of cells (i.e., size of the domain) increases, it becomes computationally difficult to 
generate good policies with POMDP-1. The robot needs a greater number of actions to achieve the goal and there is 
a loss in accuracy if the limit on the number of actions the robot can execute before termination is reduced. While 
using POMDP-1, any incorrect observations (e.g., false positive sightings of objects) significantly impacts the ability 
to successfully complete the trials. PA, on the other hand, focuses the robot’s attention on relevant regions of the 
domain (e.g., specific rooms and cells), and it is thus able to recover from errors and operate efficiently. 

Next, we evaluated the time taken by PA to generate a plan as the size of the domain increases. We characterize 
domain size based on the number of rooms and the number of objects in the domain. We conducted three sets of 
experiments in which the robot reasons with: (1) all available knowledge of domain objects and rooms; (2) only 
knowledge relevant to the assigned goal—e.g., if the robot knows an object’s default location, it need not reason about 
other objects and rooms in the domain to locate this object; and (3) relevant knowledge and knowledge of an additional 
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Figure 7: Effect of using default knowledge—principled representation of defaults significantly reduces the number 
of actions (and thus time) for achieving assigned goal. 



Figure 8: Ability to achieve goals, and number of actions executed, using only POMDPs, when different probabil¬ 
ity values are assigned to default statements and the ground truth locations of objects perfectly matches the default 
locations. The number of actions decreases and success (%) increases as the probability value increases. 


20% of randomly selected domain objects and rooms. Figures 6 (a)|6(c) summarize these results. We observe that 
using just the knowledge relevant to the goal to be accomplished significantly reduces the planning time. PA supports 
the identification of such knowledge based on the refinement and zooming operations described in Section [7] As a 
result, robots equipped with PA will be able to generate appropriate plans for domains with a large number of rooms 
and objects. Furthermore, if we only use a probabilistic approach (POMDP-1), it soon becomes computationally 
intractable to generate a plan for domains with many objects and rooms. These results are not shown in Figure[6 but 
they are documented in prior papers evaluating just the probabilistic component of the proposed architecture l47Tl52i . 

To evaluate hypothesis H3, i.e., to evaluate our representation and use of default knowledge, we first conducted 
trials in which PA was compared with PA*, a version that does not include any default knowledge, e.g., when the robot 
is asked to fetch a textbook, there is no prior knowledge regarding the location of textbooks, and the robot explores 
the closest location first. Figure [7] summarizes the average number of actions executed per trial as a function of the 
number of rooms in the domain—each sample point in this figure is the average of 10000 trials. The goal in each trial 
is (as before) to move a specific object to a specific place. We observe that our (proposed) representation and use of 
default knowledge significantly reduces the number of actions (and thus time) required to achieve the assigned goal. 

Next PA was compared with POMDP-2, a version of POMDP-1 that assigns specific probability values to default 
knowledge (e.g., “textbooks are in the library with probability 0.9”) and suitably revises the initial belief state. The goal 
(once again) was to find and move objects to specific locations, and we measured the ability to successfully achieve 
the assigned goal and the number of actions executed before termination. Figures [8]|9] summarize the corresponding 
results under two extreme cases representing a perfect match (mismatch) between the default locations and ground 
truth locations of objects. In Figure [8] the ground truth locations of target objects (unknown to the robot) match 
the default locations of the objects, i.e., there are no exceptions to the default statements. We observe that as the 
probability assigned to the default statement increases, the number of actions executed by the robot decreases and 
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Probability assigned to default statements 


Figure 9: Ability to achieve goals, and number of actions executed, using only POMDPs, when different probability 
values are assigned to default statements and the ground truth locations of objects never matches the default locations. 
The number of actions increases and success (%) decreases as the probability value increases. 


the fraction of trials completed successfully increases. However, for larger values along the x-axis, the difference 
in the robot’s performance for two different values of the probability (assigned to defaults) is not that significant. 
In Figure [HJ the ground truth locations of the target objects never match the default locations of the objects, i.e., 
unknown to the robot, all trials correspond to exceptions to the default knowledge. In this case, the robot executes 
many more actions before termination and succeeds in a smaller fraction of trials as the probability value assigned to 
default statements increases. We also repeated these experimental trials after varying the extent to which the ground 
truth locations of objects matched their default locations. We noticed that when the probability assigned to default 
statements accurately reflects the ground truth, the number of trials in which the robot successfully achieves the 
goal increases and approaches the performance obtained with PA. However, recall that computing the probabilities 
of default statements accurately takes a lot of time and effort. Also, these probabilities may change over time and 
the robot’s ability to achieve the assigned goals may be sensitive to these changes, making it difficult to predict the 
robot’s behavior with confidence. In addition, it is all the more challenging to accurately represent and efficiently 
use probabilistic information about prioritized defaults (e.g.. Example [2]). In general, we observed that the effect of 
assigning a probability value to defaults is arbitrary depending on factors such as (a) the numerical value chosen; 
and (b) the degree of match between ground truth and the default information. For instance, if a large probability 
is assigned to the default knowledge that books are typically in the library, but the book the robot has to move is an 
exception to the default (e.g., a cookbook), it takes significantly longer for POMDP-2 to revise (and recover from) the 
initial belief. PA, on the other hand, supports elegant representation of, and reasoning with, defaults and exceptions to 
these defaults. 

Robot Experiments: In addition to the trials in simulated domains, we implemented and evaluated PA with POMDP- 
1 on physical robots using the Robot Operating System (ROS). We conducted experimental trials with two robot plat¬ 
forms (see Figure [TJ in variants of the domain described in Example [T| Visual object recognition is based on learned 
object models that consist of appearance-based and contextual visual cues (SI- Since, in each trial, the robot’s initial 
location and the target object(s) are chosen randomly, it is difficult to compute a meaningful estimate of variance, and 
statistical significance is established through paired trials. In each paired trial, for each approach being compared (e.g., 
PA or POMDP-1), the target object(s), the robot’s initial location, and the location of domain objects are the same, and 
the robot has the same initial domain knowledge. 

First, we conducted 50 trials on two floors of our Computer Science department building. This domain includes 
places in addition to those included in our illustrative example, e.g.. Figure [1(a)] shows a subset of the domain map of 
the third floor of the building, and Figure [T(bj| shows the Peoplebot wheeled robot platform used in these trials. The 
robot is equipped with a stereo camera, laser range finder, microphone, speaker, and a laptop running Ubuntu Finux 
that performs all the processing. The domain maps are learned and revised by the robot using laser range finder data 
and the existing ROS implementation of a SEAM algorithm ifTTIl . This robot has a manipulator arm that can be moved 
to reachable 3D locations relative to the robot. However, since robot manipulation is not a focus of this work, once the 
robot is next to the desired object, it extends its gripper and asks for the object to be placed in it. For experimental trials 
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on the third floor, we considered 15 rooms, which includes faculty offices, research labs, common areas and a corridor. 
To make it feasible to use POMDP-1 in such large domains, we used our prior work on a hierarchical decomposition 
of POMDPs for visual sensing and information processing that supports automatic belief propagation across the levels 
of the hierarchy and model generation in each level of the hierarchy li47l|52]| . The experiments included paired trials, 
e.g., over 15 trials (each), POMDP-1 takes 1.64 as much time as PA (on average) to move specific objects to specific 
places. For these paired trials, this 39% reduction in execution time provided by PA is statistically significant: p-value 
= 0.0023 at the 95% significance level. 

Consider a trial in which the robot’s objective is to bring a specific textbook to the place named study-corner. The 
robot uses default knowledge to create a plan of abstract actions that causes the robot to move to and search for the 
textbook in the main-library. When the robot does not find this textbook in the main-library after searching using a 
suitable POMDP policy, replanning by the logician causes the robot to investigate the auxJibrary. The robot finds the 
desired textbook in the auxJibrary and moves it to the target location. A video of such an experimental trial can be 
viewed online at http: //youtu. be/8zL4R8te6wg 

To explore the applicability of PA in different domains, we also conducted 40 experimental trials using the Turtle- 
hot wheeled robot platform in Figure [T(cj] in a variant of the illustrative domain in Example |T] This domain had three 
rooms in the Electrical Engineering department building arranged to mimic a robot operating as a robot butler, with 
additional objects (e.g., tables, chairs, food items etc). The robot was equipped with a Kinect (RGB-D) sensor, a laser 
range finder, and a laptop running Ubuntu Linux that performs all the processing. As before, the robot used the ROS 
implementation of a SLAM algorithm, and a hierarchical decomposition of POMDPs for POMDP-1. This robot did 
not have a manipulator arm—once it reached a location next to the location of the desired object, it asks for the object 
to be placed on it. The experiments included paired trials, e.g., in 15 paired trials, POMDP-1 takes 2.3 as much time 
as PA (on average) to move specific objects to specific places—this reduction in execution time by PA is statistically 
significant at the 95% significance level. 

Consider a trial in which the robot’s goal was to fetch a bag of crisps for a human. The robot uses default knowledge 
about the location of the bag of crisps, to create a plan of abstract actions that causes the robot to first move to the 
kitchen and search for the bag of crisps. The robot finds the bag of crisps, asks for the bag to be placed on it (since 
it has no manipulator), and moves back to table 1 in lab 1 (the location of the human who wanted the crisps) only to 
be told that it has brought a bag of chocolates instead. The robot diagnoses the cause for this error (human gave it the 
incorrect bag in the kitchen), goes back and fetches the correct bag (of crisps) this time. A video of this trial can be 
viewed online at https : //vimeo . com/136990534 

11 Conclusions 

This paper described a knowledge representation and reasoning architecture that combines the complementary strengths 
of declarative programming and probabilistic graphical models. The architecture is based on tightly-coupled transi¬ 
tion diagrams that represent domain knowledge, and the robot’s abilities and goals, at two levels of granularity. The 
architecture makes several key contributions. 

• Action language AL C \ is extended to support non-Boolean fluents and non-deterministic causal laws, and is used 
to describe the coarse-resolution and fine-resolution transition diagrams. 

• The notion of history of a dynamic domain is extended to include default knowledge in the initial state, and 
a model of this history is defined. These definitions are used to define a notion of explanation of unexpected 
observations, and to provide an algorithm for coarse-resolution planning and diagnostics. The algorithm is 
based on the translation of a history into a program of CR-Prolog and computed answer sets of this program. 
The desired plan and, if necessary, explanations are extracted from this answer set. 

• A formal definition is provided of one transition diagram being a refinement of another transition diagram, and 
the fine-resolution diagram is defined as a refinement of the coarse-resolution transition diagram of the domain. 

• The randomization of the fine-resolution transition diagram is defined, and an algorithm is provided for exper¬ 
imental collection of statistics. These statistics are used to compute the probabilities of action outcomes and 
observations at the fine-resolution. 
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• A formal definition is provided for zooming to a part of the randomized fine-resolution diagram relevant to 
the execution of any given coarse-resolution (abstract) action. This definition is used to automate the zoom 
operation during execution of the coarse-resolution plan. 

• We also provide an algorithm that uses the computed probabilities and the zoomed part of the fine-resolution 
transition diagram, to automatically construct data structures appropriate for the probabilistic implementation 
of any given abstract action. The outcomes of probabilistic reasoning update the coarse-resolution history for 
subsequent reasoning. 

• Finally, and possibly one of the major contributions, is that we articulate a general methodology for the design 
of software components of robots that are re-taskable and robust. It simplifies the use of this architecture in 
other domains, provides a path to predict the robot’s behavior, and thus increases confidence in the correctness 
of the robot’s behavior. 

In this paper, the domain representation for non-monotonic logical reasoning at coarse-resolution is translated to a CR- 
Prolog program, while the representation for probabilistic reasoning is translated to a POMDR These choices allow 
us to reason reliably and efficiently with hierarchically organized knowledge, and to provide a single framework for 
inference, planning and explanation generation, and for a quantifiable trade off between accuracy and computational 
efficiency in the presence of probabilistic models of uncertainty in sensing and actuation. Experimental results in 
simulation and on physical robots indicate that the architecture supports reasoning at the sensorimotor level and the 
cognitive level with violation of defaults, noisy observations and unreliable actions, and has the potential to scale well 
to complex domains. 

The proposed architecture open up many directions for further research, some of which relax the constraints im¬ 
posed in the design of our current architecture. First, we will further explore the tight coupling between the transition 
diagrams, and between logical and probabilistic reasoning, in dynamic domains. We have, for instance, explored 
different resolutions for reasoning probabilistically m, and investigated the inference, planning and diagnostics ca¬ 
pabilities of architectures that reason at different resolutions j53l . However, we have so far not explored non-stationary 
domains, a limiting constraint that we seek to relax in future work. Second, our architecture has so far focused on a 
single robot, although we have instantiated the architecture in different domains. Another direction of further research 
is to extend the architecture to enable collaboration between a team of robots working towards a shared goal. It is 
theoretically possible to extend our architecture to work on multiple robots, but it will open up challenging questions 
and choices regarding communication (between robots) and propagation of beliefs of a robot and its teammates. Third, 
the proposed architecture has focused on representation and reasoning with incomplete knowledge, but a robot col¬ 
laborating with humans in a dynamic domain also needs to be able to leant from its experiences. Preliminary work in 
this direction, e.g., based on combining relational reinforcement learning with declarative programming, has provided 
some promising results mm. and we seek to further explore this direction of work in the future. The long-term 
objective is to better understand the coupling between non-monotonic logical reasoning and probabilistic reasoning, 
and to use this understanding to develop architectures that enable robots to assist humans in complex domains. 
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A Proof of Proposition [3] 

In this section, we prove Proposition [I] which states that: 

A path M = (oo,aoi ,..., a„_i ,a n ) of x(2) is a model of history Jt° n iff there is an answer set AS of a program 
Yl(2,2f) such that: 

1. A fluent literal (/ = y) £ a, iff val(f,y,i) £ AS, 

2. A fluent literal (/^ yi) £ a, iff val(f,y 2 ,i)£AS, 

3. An action e £ a,- iff occurs(e, i) £ AS. 

For our proof, we will need the following Lemma: 

Lemma 1. Let Oo be a state of the transition diagram of system description 2, ( ao ,... ,a„-i) be a sequence of actions 
from the St’s signature, and 


2 =def TI(2) Uva/((Jo,0) U (occurs(a,-,i) : 0 < i < nj 


If 2) is well-founded then every answer set of 2 defines a path of t(2). 

Proof of the propositioi0 

First, we use the Splitting Set Theorem for CR-Prolog E) to simplify TL(2, Jf) by eliminating from it occurrences 
of atoms formed by relations obs and hpd. Let us denote the set of such atoms by U. To apply the theorem it is 
sufficient to notice that no cr-rule of the program contains atoms from U and that rules whose heads are in U have 
empty bodies. Hence, U is a splitting set of 11(2, .2'). By Splitting Set Theorem, set AS is an answer set of n(2, .2) 
iff AS = ObsUAS 1 where Obs is the collection of atoms formed by obs and hpd from history 2f, and AS' is an answer 
set of the program 2\ obtained from Yl(2,Jif) by: 


Replacing every ground instance: val(f(x),y 1 0) •<— obs(f(x) =y,0), such that obs(f(x) =y,0) £ Obs, of the 
rule in Statement 13 by: val(f(x),y, 0) and removing all the remaining ground instances of this rule. 


• Replacing every ground instance: 


val(f,yi,i ), 
obs{f=y 2 ,i), 
y i /S2- 


of the mle in Statement 15 such that obs(f = y 2 ,i) £ Obs and y i y 2 by: <— val(f,y i, i) and removing all the 
remaining ground instances of this rule. 


• Replacing every ground instance: occursia , I) <— hpd ( a , i), such that hpd (a.i) £ Obs, of the rule in Statement 16 
by: occurs(af) and removing all the remaining ground instances of this rule. 


Set AS defines a path M of z(2) iff AS' defines a pathM of z(2). Our goal now is to find a state Ob of t(2) such that 
AS' is an answer set of a program: 


2 —def n(2) U vfl/(ao,0) U {occurs (at, i) : hpd (a,-, i) £ 2? and 0 < i < n}. 

Once this is established, the conclusion of the proposition will follow immediately from Lemma[T| 

Let ao = {/( Jc) = y : val ( f(x),y ,0) £ AS'}, and consider a series of transformations of 11(2, J$?) such that AS' is an 
answer set of each program in this series and that the last program of the series is 2. 

’’This proof uses the observations that are listed later in this section. They are easy to prove, and either belong to the folklore of the field or 
appear as lemmas in early ASP papers. 
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Step 1: 

By the definition of answer set of CR-Prolog, AS' is an answer set of an ASP program A /2 obtained from P/\ by 
replacing its CR-rules with a (possibly empty) collection of their ASP counterparts, i.e., rules of the form: 

val(f(x),yk, 0 ) -<^val(body, 0 ), (28) 

range(f ,c), 
member (y^.c ), 

yk¥=y- 


Step 2: 

From Observation 1, we have that AS' is an answer set of A /2 iff AS' is an answer set of program A /2 obtained from 
A /2 by removing rules of the form ■£- val (f,y 1 , i). 

Step 3: 

From Observation 2 and definition of tr 0 , we have that AS' is an answer set of iff AS' is an answer set of program 

p ?4 = u val (<7q , 0 ). 


. and the definition of (To imply that every ground instance of rules in Statements 11 


Step 4: 

The fact that AS' satisfies rules of. 
and 28 either has a literal in the body which does not belong to AS' or a literal in the head which is an atomic statement 
in @* 4 . Hence, by Observations 3 and 4, AS’ is an answer set of a program P/$ obtained from , 3^4 by removing these 
instances. This completes our transformation. 


Note that program P/^, is of the form: 


n(S>) U va/(c 0 ,0) U {occurs(at, i): 0 < i < «}. 


To show that A /2 is equal to program A/ from Lemma[I] it remains to be shown that Co is a state of t(PP), i.e., that Co 
is an interpretation of the signature of @ that is a unique answer set of program Y[ C ( PA) U C ( " d . To prove that Co is an 
interpretation we need to show that for every fix) there is y such that f(x) = y £ Oq. Consider three cases: 


1. There is y such that f(x) = y is the head of a default. In this case, there is a rule r of the form (in Statement 11 1 
in P /2 obtained from this default. There are again two possibilities: 


• The body of r is satisfied by AS'. In this case val(f(x),y, 0) € AS' and hence f{x) =y £ Oo 

• There is some literal in the body of r which is not satisfied by AS'. In this case. Statement [l4| guarantees 
that f(x) = yi for some y,- is in AS' and, hence, in Co. 

2. f{x) is a basic fluent and there is no atom formed by fix) which belong to the head of some default (Statement[7ji. 
In this case, the existence of y, such that val(f(x),y l A)) £ ,4.S -/ , and hence in Co, is guaranteed by Statement [l4| 

3. f{x) is a defined fluent. In this case, its value is assigned by one of the existing rules (e.g., the CWA). 


To show that Co is the unique answer set of Ip = W ( (/) U C ( '," / we first observe that Co is an answer set of R\ iff it is 
an answer set of the encoding IP of R\ which consists of ground instances of rules corresponding to state constraints, 
definitions and CWA, and facts of the form va/(/(Jc),y,0) where /(x) is a basic fluent and /(x) = y £ Co- To finish 
proving that Co is a state, we need to show that va/(co,0) is an answer set of /C. To do that, it is sufficient to notice 
that val (Co. 0) is a splitting set of program P/^ and hence. By Splitting Set Theorem, AS' =ASb PAS, where AS* is 
the answer set of R 2 and ,4.S’, does not contain atoms with a time step 0. Clearly, AS* = Co. Finally, we need to recall 
that @ is well-founded and, hence, by the definition of well-foundedness, IP cannot have multiple answer sets. 

Observation 1: If FI is an ASP program and C is a collection of rules with the empty heads then AS' is an answer set 
of n U C iff AS' is an answer set of FI which satisfies rules of C. 


Observation 2: If AS' is an answer set of an ASP program II and .S'o C AS', AS' is an answer set of the program 
obtained from II by adding to it the encoding of literals from .S’o. 
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Observation 3: If AS' is an answer set of an ASP program II then AS' is an answer set of a program obtained from II 
by removing rules whose bodies are not satisfied by AS'. 

Observation 4: If AS' is an answer set of an ASP program II then AS' is an answer set of a program obtained from II 
by removing rules whose heads have occurrences of atoms which are facts of II. 

To formulate the next observation we need some notation and terminology. If a is a mapping from atoms to atoms 
then for any rule r by a(r) we mean the rule obtained from r by applying a to all occurrences of atoms in r. Similarly 
for a collection of literals. 

ASP programs Pi and l\ are called isomorphic if there is a one to one correspondence a between literals of Pi and 
literals of P 2 such that a rule r £ P\ iff the rule a(r) £ p> (i.e. r £ P\ implies that a(r) £ P 2 and r £ Pi implies that 
a-'(r)£P l ). 

Observation 5: If Pi and P 2 are isomorphic with an isomorphism a, then AS' is an answer set of P\ iff a (AS') is an 
answer set of P 2 . 

Proof of Lemma[l](sketch): Let Go and 3 s be as in Lemma[I] To simplify the argument we assume that S? contain 
no statics which can be eliminated using the splitting set theorem without any effect on statements formed by relations 
veil and occurs. 

By £? n we denote the grounding of n(SA) with time steps ranging from 0 to n combined with the set val (do.0) U 
{occurs(cii,i): 1 </</;}. To prove the Lemma, it is sufficient to show that every answer set AS' of 3A n defines a path 
of x(@). This proof is by induction on n. 

1. The base case (n= 0) is immediate since val(Go, 0) is a subset of an answer set AS' of PPo and hence the path (gq) 
is defined by AS'. 

2. Assume that the lemma holds for and show that it holds for £P n . 

The program can be represented as: fiP n = UPq. Let us denote the set of literals occurring in the heads of rules 
from £?0 by U. It can be checked that U is a splitting set of £? n and therefore, by Splitting Set Theorem, every answer 
set AS' of 2? n can be represented as AS' = A So U A S \ where A So is an answer set of S?o and AS) is an answer set of 
partial evaluation of Rq with respect to U and ASo- 

Let us first show that ASq defines a single transition path (Go,a 0 , (7i) of x(SA). The definition of transition and the fact 
that Go is a state of t(@) implies that the only thing which needs proving is that Ci = {/(x) = y : val(f(x),y, 1) £ So} 
is a state. The proof consists of the following two steps: 

1. Show that G\ is an interpretation, i.e. for every f(x) there is y such that /(x) = y £ <7i. 

(a) PPo contains a rule: 

val(f(x),y 1 ,1) or .. or val(f(x),yk, 1) ^val(body, 0), (29) 

occurs(a, 0). 

or a rule: 

val(f(x),y, 1) «— val(body , 1) (30) 

whose body is satisfied by ASo- To satisfy such rules ASo must contain val(f(x),y, 1) for some value y. 

(b) If case (a) above does not hold then consider two cases: / is basic and / is defined. To deal with the first 
case note that, since Co is a state, there is some y such that /(x) =y £ Go- Then, by Inertia Axiom /(x) will have 
the same value at step one, i.e. val(f(x),y, 1) is in ASo- If / is defined then val(f(x), false, 1) is in ASo due to 
CWA for defined fluents. In both cases, /(x) = y £ CTj . 

2. Uniqueness of an answer set of TL(54) U G" d follows from the well-foundedness of S>. Thus, G\ is a state and 
(oo.ao, CJi) is a transition defined by ASo- 
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To complete the induction, we notice that, by Splitting Set Theorem, AS' is an answer set of iff it is an answer set 
ofASoU/?o- The latter program can be represented as: (ASo \ va/(<7i, 1)) U Q, where: Q = val(<j \, 1) \JR. Note that 
Q is isomorphic to |, where the isomorphism, a, is obtained by simply replacing a time-step i in each atom of 
R by i— 1. By inductive hypothesis and Observation 5 (above), we have that an answer set ,4.Sj of Q defines a path 
(oi,ai,... , <7„) in x{S>). Since by Splitting Set Theorem AS' = ASo UASi, AS' defines a path (f7o, a o 1 °'i:«i ; • • • • <7„). 
This completes the proof of the proposition. 


B Proof of Proposition [2] 

In this section, we examine Proposition^ which states that: 

Let 3>h and ^ be the coarse-resolution and fine-resolution system descriptions for the office domain in Example [l] 
Then is a refinement of @H. 

To establish this proposition, we need to establish that the relationship between 3>h (Example^ and (Section [77T]) 
satisfies the conditions of Definition [7] as stated below. 

System description ffr is a refinement of system description $>h if: 

1. States of Tl are the refinements of states of T//. 

2. For every transition (<J\ 1 a H ,(72) of T//, every fluent f in a set F of observable fluents, and every refinement 5] 
of <J\, there is a path P in TLfrom (5i to a refinement 82 of 02 such that: 

(a) Every action of P is executed by the robot which executes a H . 

(b) Every state of P is a refinement of (J\ or (72, i.e., no unrelated fluents are changed. 

(c) observed(R, f,Y) = true € 82 iff (f = Y) € 82 and observed (R,f,Y) = false G 81 iff {f fiY) £ 82 - 

The first condition in Definition [7] is that states of T/ be refinements of states of T h, as specified by Definition [6] To 
establish this condition, let 8 be a state of t 1 , i.e., a unique answer set of a program 11/ with statements such as: 

com ponent (ci, office ) 


next Jo(c\ ,( 2 ), nextJo{c 2 ,cs) 


loc(rob\) =c 1 , loc(cupi) =C(, 

which include atoms for (a) cells that are components of rooms; (b) adjacent cells that are accessible from each other; 
and (c) initial locations of the robot and a coffee cup. Program IT/, has axioms for loc such as: 

loc{cup\) =C if loc{rob\)=C , in_hand{rob\,cupi) 
loc*(cupi) = Rm if loc(cup\) = C, component(C,Rm) 
loc*(rob\) = Rm if loc(robi) = C, component(C,Rtn) 

defined (here) in the context of textbook cup \, and axioms for static relation next Jo such as: 

nextJo(C 2 ,C\) if nextJo{C\ 1 C 2 ) 

nextJo*(Rm\,Rni 2 ) if nextJo(C\ 1 C 2 ), component{C\,Rni\), component(C 2 ,Rni 2 ), Rm\ j^Rni 2 

The program IT/, also includes axioms related to the basic knowledge fluents corresponding to observations such as 
can Jre Jested (rob\ foc{cup\) ,C), and the CWA for defined knowledge fluents such as may xliscover(rob\ ,loc* (cupi) ,Rm). 
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Please see refined. sp at https: //github. com/mhnsrdhrn/ref ine-arch for an example of the complete pro¬ 
gram (in SPARC), with additional axioms for planning. 

We need to show that there is a state <7 € T// such that 8 is a refinement of < 7 . We will do so by construction. Let a be 
the union of two sets of atoms: 

cr = (7* U <7 nm (31) 

ff* = {/ = y : f is a magnified coarse-resolution domain property, (/* = y) £ <5} 

O nm = {/ = y : / is a non-magnified coarse-resolution domain property, (/ = y) £ 8 } 

where a*, is a collection of atoms of domain properties that are magnified during refinement, and <7„„, is a collection of 
atoms of domain properties that are not magnified during refinement. Continuing with our example in the context of 
Figure[3] < 7 ,* includes }loc*(rob\) = r\,loc*(cup\) = r 2 , next Jo* (r\,r 2 )}, where r\ = office and r 2 = kitchen, and 
a nm includes -> inJiand(rob\,cup \). Based on Definition[ 6 ] 8 is a refinement of < 7 . We now need to establish that <7 is 
a state, i.e., an answer set of program II// that consists of all the facts in <7, and the axioms of the coarse-resolution 
system description. To do so, we need to establish that the facts in <7 satisfy the axioms in II//. This holds true by 
construction of <7 and the system description thus establishing the first condition of Definition [7] 

Next, we establish the second condition in Definition [7] We do so by construction, and by considering two repre¬ 
sentative transitions—other transitions can be addressed in a similar manner. In our discussion below, descriptions 
of states omit negative literals for simplicity. We also often omit atoms formed of statics, fluents unchanged by the 
transition, knowledge fluents, and fluents whose values are unknown. Furthermore, unless otherwise stated, the set F 
of observable fluents includes all fluents. 

First, consider the transition corresponding to action a H = move (rob i, kitchen) that moves the robot from the office 
to the kitchen in Figure[3] Considering just the fluent that changes due to this transition, {loc(rob i) = office} £ <7\ 
and {loc(rob\) = kitchen} £ < 72 - Without loss of generality, consider a refinement <5[ of (7i that has atoms such as: 

Ioc*(rob\) = r\ 
loc(rob \) = C 2 

directly .observed (rob \, loc(rob \), C 2 ) 
directly j>bserved(rob\,loc(rob\),c$) = undet 

which implies that the robot is in cell C 2 in room r \, which is the office, and the value of directly -observed is 
undetermined for all its parameters other than the current location of the robot. Our description of delta\ omits atoms 
such as next Jo(c\ . 02 ), nextJo(cs,cf), and atoms corresponding to some other fluents. 

Now, we construct a path Pi from Zl that corresponds to two state transitions. The first transition corresponds to 
executing action move(rob\,cf), changing the state from 5] to 5i. fl that has atoms such as: 

<5i,a = {loc(rob\) = C5, loc*(rob{) = r 2 } 

and corresponds to the robot’s movement from cell c 2 in room r\ to neighboring cell C 5 in room ro. The second 
transition of the constructed path P\ corresponds to action test (rob\,l oc (rob \). C 5 ), which causes a transition to state: 

(b ={/oc(ro£>i) =c 5 , loc*(rob \) =r 2 , 

directly-observed (rob\,loc(rob\),cf), observed(rob\,loc(rob\),cf), observed(robi,loc*(rob\),r 2 } 

where the newly added atoms correspond to observing the robot’s location. We observe that 82 is a state of T/ and a 
refinement of state 02 of T// (see Definition [ 6 }. To establish the second condition of Definition]?] we need to show that 
the chosen coarse-resolution transition, constructed path Pj from T/,. set F of observable fluents, and refinement 81 of 
Of, satisfy requirements (a)-(c). To establish requirement (a), notice that every action of path Pi is executed by a robot 
that executes the coarse-resolution action a H . To establish requirement (b), notice that state <5| „ is a refinement of 
( 72 —we already know that dj and 82 are refinements of < 7 i and 02 respectively—i.e., every state of Pi is a refinement 
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of C 7 ] or Oi and unrelated fluents remain unchanged. Finally, to establish requirement (c), notice that Statements [2~j~|[23] 
and|24](in the construction of S)l) ensure that for each / £ F, observed (rob \, f. Y) = true £ 83 iff (/ = Y ) £ Si and 
observed(rob\,f,Y) = false £ 82 iff {f f^Y) £ 82 - 

As a second representative example of a transition, consider the robot, which is in the kitchen, executing a H = 
grasp{rob\,cup\) to pick up the coffee cup cup\ that is known to be in the kitchen. Considering just the fluents 
that change as a result of this transition, the state (73 £ zp has {loc{rob\) = kitchen,loc{cupi) = kitchen}. Now, 
consider a refinement 82 of 03 that has the atoms such as: 

loc*(rob\) = Y2, loc{rob\) = C5 
loc*(cupi) = r2, loc{cup\) = C(, 
directly .observed [rob \, loc{rob \), C5) 
directly_observed{rob\,loc{rob\),cf) = undet 

which implies that the robot is in cell C5 in room ri, coffee cup cup\ is in cell q, in ri, and the value of directly-observed 
is undetermined for all its parameters other than the current locations of the robot and the coffee cup. 

Now, consider path P2 from T/ that corresponds to four state transitions. The first transition involves executing action 
move {rob i,c^), which changes the state from £3 to 82, a that has atoms such as: 

5 i jQ ={loc{rob\) = eg, loc*{rob\) = ^2, loc{cup \) = c§, loc*{cup\) = ri} 

and corresponds to the robot moving from cell C 5 to cell ce in room n. The second transition corresponds to executing 
action test{rob\,loc{rob\),cf), which changes the state to deltas,^ that has atoms such as: 

83,b ={loc{rob\) = C6, loc*{rob\) = r2, loc{cup\) = C6, loc*{cupf) = r2, 

directly-observed ( rob \, loc{rob \), c ^), observed {rob \, loc{rob\),C(,), observed {rob \, loc* {rob 1), ri} 

where we notice that the newly added atoms correspond to observing the robot’s location. The third transition along 
path P2 corresponds to action grasp{rob\,cup\), which changes the state to 83 , that has atoms such as: 

8i tC ={loc{rob\) = C6, loc*{rob\) = r2, loc{cup\) = C(,, loc*{cup\) = r2, inJiand{rob\,cup\), 

directly -observed {rob \ ,loc{rob\) ,cf ), observed{rob\,loc{rob\),cf), observed{rob\,loc* {rob\),r2} 

where we notice that cup\ is now in the robot’s grasp. Finally, action test{rob\,inJiand{rob\,cup\),true) causes a 
transition to state 84 that has atoms such as: 

84 ={loc{rob\) = C(,, loc*{rob\) = ri, loc{cup\) = c^, loc*{cup\) = in_hand{rob\,cupi), 

directly -observed {rob \, loc{rob \), C(,), observed {rob loc{rob \), cf ), observed {rob \, loc* {rob\),r2, 
directly-observed{rob\,in Jiand{rob\,cup\),true), observed{rob\,inJiand{rob\,cup\),true)} 

We observe that 84 is a state of T/ and a refinement of state 04 of T//. To establish the second condition of Definition[7] 
we need to show that the chosen coarse-resolution transition, constructed path Pi from T/., set F of observable fluents, 
and refinement cT, of 03 , satisfy requirements (a)-(c). To establish requirement (a), notice that every action of Pi has 
to be executed to implement a H . Next, to establish requirement (b), we notice that 8v rl and 831, are refinements of 03 , 
while <5 3x . and ^are refinements of 04 , i..e, every state of P 2 is a refinement of 03 or 04 and unrelated fluents remain 
unchanged. Finally, to establish requirement (c), notice that Statements [21] [23] and [24] (in the construction of S>f) 
ensure that for each f £ F, observed{rob\,f,Y) = true £ 82 iff (/ = Y) £ 82 and observed{rob\,f ,Y) = false £ 82 
H I' J f Y) £ 82 . 

The steps described above can be repeated for other coarse-resolution transitions. We have thus shown that bfr is 
indeed a refinement of These steps can also be verified by solving ref ined. sp after suitably revising the initial 
state and goal state. 
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C POMDP Construction Example 


In this section, we provide an illustrative example of constructing a POMDP for a specific abstract action that needs 
to be implemented as a sequence of concrete actions whose effects are modeled probabilistically. 

Example 7. [Example of POMDP construction] 

Consider abstract action a H = grasp(rob \, tb \ ), with the robot and textbook in the office, in the context of Example^ 
The corresponding zoomed system description S>lr(T ) is in Example[6] For ease of explanation, assume the following 
transition probabilities, observation probabilities, and rewards—these values would typically be computed by the robot 
in the initial training phase (Section |7T2| ): 

• Any move from a cell to a neighboring cell succeeds with probability 0.85. Since there are only two cells in this 
room, the robot remains in the same cell if move does not succeed. 

• The grasp action succeeds with probability 0.95; otherwise it fails. 

• If the thing being searched for in a cell exists in the cell, 0.95 is the probability of successfully finding it. 

• All non-terminal actions have unit cost. A correct answer receives a large positive reward (100), whereas an 
incorrect answer receives a large negative reward (—100). 

The elements of the corresponding POMDP are described (below) in the format of the approximate POMDP solver 
used in our experiments 137). As described in Section [iO] please note that: 

• Executing a terminal action causes a transition to a terminal state. 

• Actions that change the p-state do not provide any observations. 

• Knowledge-producing actions do not change the p-state. 
discount: 0.99 

values: reward 

'/. States, actions and observations as enumerated lists 

states: robot-0-object-0-inhand robot-l-object-l-inhand robot-0-object-0-not-inhand 
robot-O-object-l-not-inhand robot-l-object-0-not-inhand 
robot-l-object-l-not-inhand absb 

actions: move-0 move-1 grasp test-robot-0 test-robot-1 test-object-0 test-object-1 
test-inhand finish 


observations: robot-found robot-not-found object-found object-not-found 
inhand not-inhand none 


l Transition function format. 

1 T : action : S x S’ -> [0, 1] 

l Probability of transition from first element of S to that of S’ is 
°/ 0 in the top left corner of each matrix 
T: move-0 

1 0 0 0 0 0 0 

0.85 0.15 00000 

0 0 1 0 0 0 0 

0 0 0 1 0 0 0 

0 0 0.85 0 0.15 0 0 
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0 0 0 

0 0 0 

T: move-1 
0.15 0.85 0 

0 1 0 

0 0 0.15 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

T: grasp 

10 0 
0 1 0 

0.95 0 0.05 

0 0 0 

0 0 0 

0 0.95 0 

0 0 0 


T: test-robot-0 
identity 

T: test-robot-1 
identity 

T: test-object-0 
identity 

T: test-object-1 
identity 

T: test-inhand 
identity 


0.85 0 0.15 

0 0 0 

0 0 0 

0 0 0 

0 0.85 0 

0.15 0 0.85 

0 1 0 

0 0 1 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

10 0 
0 1 0 

0 0 0.05 

0 0 0 


T: finish 
uniform 


0 

1 


0 

0 

0 

0 

0 

0 

1 


0 

0 

0 

0 

0 

0 

1 


y„ Observation function format (s) 

•/. 0 : action : s_i : z_i -> [0, 1] (or) 

7. : S x Z -> [0, 1] 

•/. In each matrix, first row provides probability of each possible 
“/. observation in the first p-state in S 
0: move-0 : * : none 1 

0: move-1 : * : none 1 

0: grasp : * : none 1 
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0: test-robot-0 


0.95 

0.05 

0 

0 

0 

0 

0 

0.05 

0.95 

0 

0 

0 

0 

0 

0.95 

0.05 

0 

0 

0 

0 

0 

0.95 

0.05 

0 

0 

0 

0 

0 

0.05 

0.95 

0 

0 

0 

0 

0 

0.05 

0.95 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0: test- 

-robot-1 






0.05 

0.95 

0 

0 

0 

0 

0 

0.95 

0.05 

0 

0 

0 

0 

0 

0.05 

0.95 

0 

0 

0 

0 

0 

0.05 

0.95 

0 

0 

0 

0 

0 

0.95 

0.05 

0 

0 

0 

0 

0 

0.95 

0.05 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0: test- 

-object- 

0 





0 

0 

0.95 

0.05 

0 

0 

0 

0 

0 

0.05 

0.95 

0 

0 

0 

0 

0 

0.95 

0.05 

0 

0 

0 

0 

0 

0.05 

0.95 

0 

0 

0 

0 

0 

0.95 

0.05 

0 

0 

0 

0 

0 

0.05 

0.95 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0: test- 

-object- 

1 





0 

0 

0.05 

0.95 

0 

0 

0 

0 

0 

0.95 

0.05 

0 

0 

0 

0 

0 

0.05 

0.95 

0 

0 

0 

0 

0 

0.95 

0.05 

0 

0 

0 

0 

0 

0.05 

0.95 

0 

0 

0 

0 

0 

0.95 

0.05 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0: test- 

-inhand 






0 

0 

0 

0 

0.95 

0.05 

0 

0 

0 

0 

0 

0.95 

0.05 

0 

0 

0 

0 

0 

0.05 

0.95 

0 

0 

0 

0 

0 

0.05 

0.95 

0 

0 

0 

0 

0 

0.05 

0.95 

0 

0 

0 

0 

0 

0.05 

0.95 

0 

0 

0 

0 

0 

0 

0 

1 

0: finish : * : 

none 1 






•/. Reward function format 

7„ R : action : s_i : s_i’ : real value 

R: finish : robot-O-object-O-inhand : * : -100 
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R 

finish 

robot 

-1-object-l-inhand : * 

100 


R 

finish 

robot 

-0-object-0-not-inhand 

* : 

-100 

R 

finish 

robot 

-0-object-l-not-inhand 

* : 

-100 

R 

finish 

robot 

-1-object-O-not-inhand 

* : 

-100 

R 

finish 

robot 

-1-obj ect-l-not-inhand 

* : 

-100 

R 

move-0 

* : * 

: -1 



R 

move-1 

* : * 

: -1 



R 

grasp : 

* : * 

: -1 



R 

test-robot-0 : 

* : * : -1 



R 

test-robot-1 : 

* : * : -1 



R 

test-object-0: 

* : * : -1 



R 

test-object-1: 

* : * : -1 



R 

test-inhand : 

*:*:-! 
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